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Abstract 

An R package for specifying and estimating linear latent variable models is 
presented. The philosophy of the implementation is to separate the model 
specification from the actual data, which leads to a dynamic and easy way 
of modeling complex hierarchical structures. Several advanced features are 
implemented including robust standard errors for clustered correlated data, 
multigroup analyses, non-linear parameter constraints, inference with in- 
complete data, maximum likelihood estimation with censored and binary 
observations, and instrumental variable estimators. In addition an exten- 
sive simulation interface covering a broad range of non-linear generalized 
structural equation models is described. The model and software are demon- 
strated in data of measurements of the serotonin transporter in the human 
brain. 

Keywords: latent variable model, maximum likelihood, multigroup 
analysis, structural equation model, R, serotonin, SERT 



1. Introduction 

Multivariate data are often modelled using random effects in order to 
account for correlation between measurements in the st atistical analysis . 



The dominating model is the linear mixed effects model flLaird and Ware 



1982), which is available in most standard statist ical software packag e s, e.g 



SAS PROC MIXED and in R in the pack ages nlme (jPinheiro and Bated . |2000| ) 



and lme4 (IBates and Maechlerl . 120091 ) with the latter one also offering some 



support for generalized linear mixed models. 
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Ano ther type of r andom effect model is the structural equation model 
(SEM) (IBollenl . Il989l ). where the terminology latent variable often is used 
instead of random effects. While the mixed effect model and structural 
equation model have many aspects in common, the aim of a SEM analysis 
is typically to analyze the association between the latent variable, represent- 
ing some process that is only partially observed, and some other variables 
(observed or latent). Thus, in SEMs focus is mainly on the latent variable, 
where one normally ascribes less interpretation to the random effects in a 
mixed effects model, which primarily serves as a way of capturing covari- 
ance between measurements. Because observed variables can be viewed as 
representations of underlying true variables SEMs offer a natural framework 
for handling measurement errors in study v ariables, and it often provide s 
an efficient analysis of high dimensional data (IBudtz-Jetrgensen et all . 120031 ) . 
The framework was pioneered by JAureskog (jJoreskogl . Il970l ) and since then 
it has been an active area of research with focus on r elaxing linearity and 
distributional assumptions ( IRabe-Hesketh et all . l20Q4f ) . 

SEMs have proven to be useful in many different fields of research. How- 
ever, applications have been dominated by covariance structure analyses, 
and as residuals on the individual level are not available in this setup, model 
assessment has been based on more or less heuristic omnibus tests. The lack 
of profound mod el diagnostics have undoubtedly lead to several poor appli- 
cations of SEMs (jSteigerl . l200ll ). and thus likely, however unjustified, giving 
the framework a somewhat bad reputation among groups of statisticians. 

Several properitary software solutions are available for analyzing SEMs 
with some of the most popul ar being LISREL, SAS PROC CALIS, AMD S, EQS, 



Stat a 12 sem, Stata gllamm (IRabe-Hesketh et all . 120041 ) and Mplus fjMuthen and Muthenl . 



20071 ). where the last two programs stand-out because of their general mod- 
elling framework. Common for all these solutions is that they are difficult 
to extend and therefore possibilities for examination of new methodologi- 
cal ideas are limited. In part because details of many features of properi- 
tary software often remains hidden from the user. Implementat i ons in an 
open-source environment such as R (IR Development Core Teaml . 120101 ) di- 
rectly addres s this prob l em. C urrently two such solution s are available: th e 
sem package (JFoxJ, 120061 . 120091 ) and the OpenMx package (IBoker et all . 1201 lh . 
While the former package is limited to standard covariance structure anal- 
ysis, OpenMx offers sophisticated methods such as multiple group analysis, 
models for ordinal data and mixture models. The predecessor package Mx 
has been a popular for analyzing family data in epidemiological genetics and 
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OpenMx will undoubtedly become an important tool in that field of research. 

This paper presents the lava-package for statistical analysis in a very 
general modelling framework known as the Linear Latent Variable Model, 
which includes structural equation models and mixed models as important 
special cases. This model class also allows for non-linear effects of covariates 
and non-linear parameter constraints. The lava-package offers a superior 
user interface for specifying, altering and visualizing the model design. Mod- 
els are specified independently of data using commands that are similar to 
standard regression modeling in R. In addition path diagrams can be gener- 
ated to help give the user an overview of the assumptions specified. Further, 
the package gives access to an extensive simulation procedure which covers, 
but is not limited to, linear latent variable models. This tool will be ex- 
tremely useful e.g. for understanding the biases caused by different types 
of model misspecification. The lava-package also includes sophisticated in- 
ferential methods such as multigroup analyses, robust standard errors for 
clustered correlated data, maximum likelihood based inference with data 
missing at random and inference for indirect and total effects. In addition 
advanced model diagnostic techniques for str uctural equa tion models (fit- 



ted in lava) are available via the gof -package (IHolstl . 120121 ). and extensions 



to models fo r censored an d binary outcomes are available via the package 



lava.tobit (IHolstl . l201ll ) covered briefly in this article. 

Modular programming has been a key concept during the software de- 
velopment thus making the process of extending the program (e.g. im- 
plementing new estimators, changing optimization routines etc.) easy, as 
exemplified by the above mentioned add-on packages. 

Our hope is that the package will serve as a platform for testing, devel- 
oping and sharing new ideas in the field of latent variable models. 

2. Linear Latent Variable Models 

We will define the Linear Latent Variable Model as the model defined 
by a Measurement part describing the responses Yi = (Yn, . . . , Y ip )': 

l q I 

Yij = Vj + XjkVik + Kj r X ir + ^jkVijk^ik + Qj, (1) 

k=l r=l k=l 

and a structural part describing the latent variables rji = (rjn, . . . ,%)': 

l q I 

7] is = a s + ^ PskVik + IsrXir + ^ T sk W isk r] ik + (is, (2) 
k=l r=l k=l 
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where i = l,...,n is the index of the sampling unit (e.g. individuals), 
j = 1, . . . ,p is the index of the observed variables (measurements or within 
cluster observations) and s = 1, . . . , I is the index of the / distinct latent 
variables. In a more compact matrix notation the model can be written as 

Y = v + \r) t + KX t + (A V-)rfe + 6i, (3) 



77, = a + £77, + TXt + (T W^r/* + 0, (4) 

where u G W and ol G R' are intercepts, and A, A G R pxZ , K G W xq , 
r G M^ 9 , £?, T G M ix/ are regression coefficient matrices (defining both 
fixed effects and random slopes), and Xj G M 9 , G M pxi , W, G lR Zx ' are 
covariates. The denotes the Schur product (element- wise multiplication). 
The residual terms follow multivariate normal distributions, ~ J\f p (0, S e ) 
and Ci ~ A^(0, ^), which typically are assumed to be independent. 

Note that the terms including the covariates V; and W, define random 
slope components as in the Laird- Ware mixed model f ormulation and differ - 



entiates the LLVM from the usual SEM formulation (ISanchez et all . 120051 ) . 
Many cases can however be modeled without such terms, resulting in a 
more computational efficient model formulation with constant variance be- 
tween individuals. In the following we will therefore initially assume that 
the model is parameterized by some 6 defining the matrices 

(u, a, A, K, B, T, S e , (5) 

with some restrictions on the parameter space to guarantee identification 
(obviously zeroes in the diagonal of B), and possibly non-linear constraints 
between the different parameters. In Section 1231 we will return to the general 
case, and demonstrate how to write up the model in atoms adapted to the 
case without any interaction terms. Note that we will allow non-linear 
constraints on the parameters between any of the elements in (J5j). 

In the setup with A = and T = it follows that the mean and 
variance of Y, given the covariates are 

Hi{0) = E(Y I Xi) = u+ A(l - B) 'a 

+ [A(l - B)-^ + K] Xi, {[> 



S = Var(l^ | Xi) = A(l - B) x *(l - B) -1/ A' + S e , (7) 
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where the fundamental property of the normal distribution that the marginals 
also are normal is exploited. Inference a bout can t hen be obtained by 
maximizing the corresponding likelihood (iBollenl . Il989l ) 



n , 

L(0; Y, X) oc J] exp -- {Y, - ^(0))' S 1 {Y, - /i,(0)) 
i=i ^ 



2.1. Implementation 

From a mathematical and implementation-wise point of view it is con- 
venient to supplement the model for mulation with the equivalent R e ticu- 
lar A ction Model (RAM) formulation flMcArdle and McDonaldl . ll984J : Ifox . 



20091 ). which have a direct connection with the underlying path diagram 



and also explicitly covers path analysis models. Let U be the stochastic 
vector including the latent variables, r], 

U=(Zx,..., Z p+qj 77!, ... , m )' = (Z, t/)', (8) 

where Z = (Z\, . . . , Z p+q )' is the stochastic vector containing all observed 
variables Z = (Y" 1? . . . , Y p , X 1; . . . , X q )' . The RAM formulation states that 

U = v g + A U + e, (9) 

where v e describes the intercepts, and e is a residual term assumed to follow 
a zero-mean normal distribution with 

Var(e) = P . (10) 

Hence the model is completely specified by the matrices vg, Pg and Ag 
where the matrices generally are sparse, and the latter have zeros in the 
diagonal. In graph-terms the matrix Ag represents the asymmetric paths 
whereas Pg represent the symmetric paths. 

Let k be the total number of variables in the model (i.e. Ag 6 R fcxfc ). 
We let J be the matrix that picks out the observed variables from U (see 
Section |Appendix~A] ) , and define 

Gg = J{l-Ag)-\ (11) 

Now it follows that 

Var(Z) = U e = GgPgG'g. (12) 
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and similarly the mean of the observed variables is then specified by the 
model structure as: 

E(Z) = $ = GgVg. (13) 

The parameter can then be estimated by maximizing the log-likelihood, 
I, for the full data vector Z 

Oml = arg max e i(0 \ Z = z). (14) 

At a first glance this formulation seems restrictive in the sense that the 
covariates also have to be normally distributed. However, if we split the 
parameter vector into 6 = (#i, 62), where 0\ parameterizes the conditional 
distribution of Y given X and 62 are the mean and variance parameters 
of the covariates, then by Bayes formula the probability density for the 
joint distribution can be written as the product of the conditional density 
multiplied by the marginal density: 

fe u e 2 (y,x) = foAv I x)f 02 (x). (15) 

It follows that maximum likelihood inference about the parameters Q\ is 
independent of the model for the covariates, and hence finding the MLE of 
the conditional likelihood is equivalent to finding the MLE of the joint likeli- 
hood. If we fix O2 to the corresponding MLE, then the expected information 
agrees in the two models: 

^ 1 og/fl 1 ,g 2 (y» a > \ (m] 
ddidG', y 1 ' 

An important advantage of the RAM formulation is that the empirical vari- 
ance and mean are sufficient statistics for the parameters, thus giving the 
joint formulation a computational advantage over the conditional likelihood 
which requires explicit calculation of the likelihood contribution of each in- 
dividual observation. In particular given the sufficient statistics, which are 
easily computed, the computational complexity in the RAM parameteriza- 
tion is independent of n. 

2.2. Inference - standard SEM 

In the following we will describe inference under a priori non-linear 
constraints ft = £l (twice differentiable w.r.t. 6). Letting p, denote the 
empirical mean of the observed variables, we define 

w fl = [£-&][£- (17) 
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and 



T e = £ + W , (18) 

where X is the ML covariance matrix estimate (non-central estimate). We 
will exploit that 

n n 

^2( z i - £o){zi ~ to)' = ^(«< - + A* - £e){zi ~ P> + A* - 

i=l i=l 

= n(S + W e )+ (19) 

n[{fi - /*)(£ - 6)' + (A* - 6) (A* - A*)'] 
= nT . 

The complete log-likelihood then is given by 

l{0 | z 1; . . . , z n ) = { ~ g lQ g( 27r ) - 2 log " 



i=i 



I(^-e 9 yn^ (20) 

— log(2vr) - 2 log |n | - I trjT^, 1 }, 



with score 



0^(0) n fdvecfleV 



89 2 \ 80 

n ( 8 vec fig \ ' 



vec [n^Tg^ 1 ] 



(p^) vec [Itf] (2!) 



and the MLE is obtained by solving the corresponding score equation by 
Fisher scoring or a similar iterative procedure. See Appendix B for details 
w.r.t. expressions for the relevant matrix derivatives and information ma- 
trix. 

In certain situations we may need to calculate conditional moments, 
for instance to calculate conditional residuals (model diagnostics) or the 
conditional likelihood given the covariates (likelihood ratio testing, model 
selection, etc.). Here we need to apply the selection matrix, J Y , that picks 
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out the endogenous variables, Y, of U, and the cancellation matrix, p x , 



that sets all exogenous variables, X, of U to zero (see Section Appendix A ). 
Notice we do not put any distributional assumptions on the exogenous vari- 
ables. 
Now 

fjLi(6) = E{Y | X = x) = JV(1 - A e )-\p x v 6 + (22) 

where v x is the p- vector which is zero everywhere but on the index of the 
exogenous variables where it is set to x, and 

S = Var(y | X) = J r (l - A e )-\p x P e p' x ){l - A e )- l 'J' Y . (23) 

2.3. Interactions with latent variables 

Including interactions between covariates and random effects in the model 
i.e. with non-zero A and T in (j3]) and (HI) , we clearly loose the property of 
constant variance between individuals and the empirical mean and variance 
are therefore not sufficient. With A and T consisting of ones in all entries, 
the marginal variance of 1^, Var(YJ | Xj, tyj, Wj), will however take the 
same form as flTJ), exchanging B with 

Bi — B + Wt, (24) 

and 

A, = A + Zi, (25) 
hence the expression for the likelihood contribution and its derivatives of a 



single individual will take the same form as derived in Appendix B 

With free parameters in A or T we can still adapt the model by adding 
one or more degenerate random effects. For instance, the term 

£y = fijkVijkr/ik + €ij (26) 

where r\ik and Qjk follow normal distributions, can trivially be parameterized 
as the simultaneous equation 

£ij = SjkVik + e ij) (27) 
Vik = VijkVik + 0, (28) 

hence the model is expanded to include a random effect with residual term 
with variance with fixed slope parameter Z^, and the variance of the 



observed variables in such a model therefore takes the form as described 
above (EH), i.e. A = 1. 

The possibility of including interactions with the random effects adds 
important flexibility to the model class for instance when modeling longi- 
tudinal data or to account for certain types of variance heterogeneity. 

2.4- Non-linear effects 

Allowing non-linear parameter constraints or non-linear effects of some 
covariates opens up for several interesting applications for instance in dose- 
response modeling. An important extension of the model framework is 
therefore to allow parametric non- linear functions (f>^\j = l,...,p and 
ip( 8 \ s — 1, . . . , I of the covariates to enter the model: 

l q i 

Yij = Vj + ^jkVik + 



k=l r=l k=l 

+ <f>®{X il ,...,X iq ) + e ij , 



(29) 



and 



7] is = a s + PskVik + 22 ^ srXir + T ^ W iskVik 
k=l r=l k=l 

+ ^(X ih ...,X iq ) + ( is . 



(30) 



Though we introduce non-linear effects in the model, we will still denote 
this model a Linear Latent Variable Model, as there are only linear effects of 
latent variables and endogenous variables in the model. Hence the observed 
data likelihood and its derivatives still have a closed form solution. 

2.5. Multigroup analysis 

A useful generalization of the model framework is the multigroup model, 
where we have several groups of data and specify a LLVM for each group. 
This naturally leads to the log-likelihood 

logL(0 | Y,X,V,W) = Y J ^gL g {6 g | Y g ,X g ,V g ,W g ) (31) 
where the intersection of the parameters (6 g ){ g eG} i s not empty. Unba l anced 



designs and data with values missing at random (ILittle and Rubinl . 120021 ) 
is naturally handled by this extension by forming groups from the different 
missing data patterns. Note that the additive structure of the log-likelihood 
makes the calculation of score functions and information matrices for the 
multigroup model explicitly available. 
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3. Model specification 



The lava package aims to deliver a dynamic model specification experi- 
ence in the sense that adding or removing model elements should be as easy 
as possible, and the model specification should be familiar to users accus- 
tomed to specifying models in for example glm in R. In order to achieve this 
we have designed a formal system for interactively specifying the complex 
hierarchical structure of a latent variable model. We believe this to be an 
important novel contribution, since the difficult specification of models in 
other languages often proves to be a significant obstacle. 

The implementation relies on R ( 1R Development Core Team! I201CH ) and 
the following packages all av ailable from the Compreh ensive R Archive 



Netw ork (CRAN;, 
2009h . sur vival ( 



mvtnorm flGenz et all 120091). graph ([G entle man et all . 
Therneau and original R port bv Thomas LumlevT 20091 ). 



numDeriv dGilbertl . |2009h and gof dHolstl. l2012h . The graphical svst 



cm 



builds on graphviz (IGansner and North! . Il999l ) an d the R-package Rgraph viz 
( Gentry et all 20091 ) (available from Bioconductor Gentleman et al ( 2004} )). 



The specification of models in the lava-language is primarily achieved via 
the constructor function lvm and the two methods regression and co- 
variance (see Table [TJ. 

A new model object is initialized with the constructor lvm 

> ml <- lvm() 

which creates an empty lvm-object. Variables (or a multivariate regression 
formula as described below) can be fed to the lvm-function as arguments 
in order to control the order of entry in the graph layout. However, this 
is optional as variables automatically will be added during the process of 
defining the linear structure. A list of formulas is also valid as a shortcut 
for successive calls to regression (see below). 

3.1. Specifying Linear Relationships 

Linear associations between variables are specified via the member func- 
tion regression taking the character vector arguments to and from. For 
convenience a replacement function, regressions, is also available which 
in addition supports specification via the usual formula statements in R. As 
a simple example we will specify the following structural equation model 
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Function 


Task 


Primary functions 




lvm 


Constructor of new model 


regression 


Add regression association to model 


co variance 


All 1j* 1 j *11 

Add correlation between residuals 




terms 


intercept 


Add intercept parameter 


constrain 


Add non-linear covariate effects or 




parameter constraints 


Secondary functions 




latent 


Define latent variables in model 


addvar 


Add variable to model 


part ix 


Define equality constraints index of 




parameters 


parameter 


Add a parameter name (for use with 




constrain) 


cancel 


Remove previously defined associa- 




tions 


kill 


Remove variables from model 


Table 1: Model building blocks 

with two measurement models 

Yj = fij + X\jU\ + eij, 

Zj = Vj + \ 2j U 2 + e 2 j, j = 1, • • • , 3, 

and structural model defined by 



U 1 = 5 1 X 1 + 5 2 X 2 + Ci, 

u 2 = pai + /? 2 A 2 + c 2 , 

and Cov(Ci, C2) 7^ 0. The model is illustrated in the path diagram of Figure 

m 

The following commands specifies a multivariate linear regression model 
with two covariates xl and x2 and two outcomes ul and u2 

> ml <- regression (ml, "ul", c("xl", "x2")) 

> regression(ml) <- u2 ~ xl + x2 
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In the following we will focus on the replacement functions and the 
formula specification as defined by the second line, but generally for all the 
available methods a standard function is available and arguments can be 
given as character vectors as above. A more compact call would simply be 

> ml <- lvm(c(ul, u2) ~ xl + x2) 

Next, we define ul and the u2 as a latent /unobserved variables using 
the latent-function: 

> latent (ml) <- ~ul + u2 

Again arguments can generally be given as character vectors instead of a 
formulas. To remove the latent status from a variable we simply use the 
cancel=TRUE argument (latent (ml , cancel=TRUE) <- ...). 
Next we define the measurement part of the model: 

> regression(ml) <- c(yl, y2, y3) ~ ul 

> regression (ml) <- c(zl, z2, z3) ~ u2 

Covariance between residual terms can be specified using the replacement 
function covariance<- (or the function covariance) where the argument is 
a vector (or formula) of variables that are assumed to be pairwise correlated. 
In the current model specification, the residuals of the two latent variables 
are assumed to be conditionally independent given the covariates, and in 
order to define correlation between the residual terms of ul and u2 we write 

> covariance (ml) <- ul ~ u2 

which specifies Cov(£i,£ 2 ) 7^ 0> thus completing the specification of the 
model defined by the path diagram in Figure [TJ Note that the model is 
specified independently of any data. The model is linked to data when pa- 
rameters are estimated using the estimate function (see Section [6]). Here 
it is important that the manifest variable names used in the model specifi- 
cation corresponds to the variable names in the data-frame. 

Removal of associations or variables can be achieved with the cancel 
(and canceK-) function which takes a character vector (or formula) as 
argument, removing any associations between all the variables in the vector. 
To remove the previously specified correlation and instead add a regression 
association between ul and u2, we write 

> cancel(ml) <- ~ui + u2 

> regression(ml) <- u2 ~ ul 
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Figure 1: Path diagram of the model ml with correlation between residuals of the la- 
tent variables. Obtained with the command plot (ml). Following the convention of 
path diagrams observed variables are framed with rectangles whereas latent variables 
are framed with ellipses. Regression associations are depicted as one-headed arrows 
(parent=predictor, child=response) and covariance/correlations are shown as (dashed) 
double-headed arrows. For easier interpretation the following color-codes are used: ex- 
ogenous:=light blue, endogenous :=orange, latent :=green. 



Notice that the last regression call defining the association between ul and 
u2 does not cancel the earlier defined predictors of u2. Hence the current 
definition says (see Figure |2]) that 



E(U 2 | U 1: X 1 ,X 2 )= t i + AA + PxX x + (3 2 X 2 



(32) 



for some parameters (/x, (3 , (3i, f3 2 ). 

To completely remove one or more variables from the model we can use 
the kill<- function. 



3.2. Constraining Parameters 

Defining restrictions on some parameters is usually needed in order to 
obtain an identifiable model, and by default lava will automatically set rea- 
sonable restrictions when model parameters are estimated (see Section [6]). 
Also, in situations where we need to test associations or use a priori knowl- 
edge in the model building, constraints on some parameters are needed. 
The lava package allows specification of completely general non-linear con- 
straints on all parameters. 

The most common type of constraints are identity constraints where 
one or more parameters are fixed to either a specific numerical value or 
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Figure 2: Path diagram of the final specification of the model ml with a direct regression 
association between the two latent variables. 



to be equal to a common free parameter (equality constraints). For co- 
variance and regression parameters (slope-parameters) the regression and 
covariance function can be used, and for the intercepts a and v in ([3}|4|), 
the intercept<- and intercept functions are used. 

As an continuing example we will specify a new multivariate regression 
model with three outcomes, {Yi,Y 2 ,Y^) ) and two predictors, (Xi,X 2 ): 

Yi = faX + jiZ + € h i = 1, . . . , 3 

> mregr <- lvm(c(yl , y2, y3) x + z) 

3.2.1. Constraining regression parameters 

The restrictions of slope parameters can be accomplished with the re- 
gression function. For example fixing the slopes of Y\, Y 2 and Y 3 on X to 
be identical, b%, and defining Z as an offset (i.e. slope 1), can be achieved 
with the calls 

> regression (mregr , c(yl, y2, y3) ~ x) <- "bl" 

> regression (mregr , c(yl, y2, y3) z) <- 1 

To simultaneously define several different constraints a list can be given as 
the right-hand side argument 

> regression (mregr , c(yl, y2) x + z) <- list(l, "a", 2, "b") 

All parameters for the first response on the covariates are given first, then 
for the second response, and so on. Hence in the above example we have 



Y 1 = X + aZ + 



and Y 2 = 2X + bZ + 
14 



(33) 



When denning constraints (intercepts, covariance or regression constraints) 
any missing associations will automatically be added to the model object. 
Hence the following call will add an extra level to the model, with a top-level 
response W with identical effects of Y 1 , Y 2 and Y 3 (see Figure [3]): 

> regression(mregr, w ~ yl + y2 + y3) <- "beta" 

To remove the constraints again (but not removing the associations) we 
simply fix to the logical constant NA: 

> regression(mregr, w ~ yl + y2 + y3) <- NA 

3.2.2. Constraining covariance parameters 

Constraints on the covariance between residual terms are set with the 
covariance function, with a similar syntax. For example, to fix the co- 
variance of the residuals terms of Y\ and Y2 to 0.5, and the variance of the 
residual term of Yi to a parameter vl: 

> covariance (mregr, yl yl + y2) <- list("vl" , 0.5) 

If we only need to constrain the variance parameters (i.e. not covariances) 
then the following syntax can be used 

> covariance (mregr , ~yl + y2) <- "v" 

here setting the residual variance of Y\ and Y 2 to be identical. 

To fix the variance of the variables to different values, we simply give a 
list of the correct length as argument, e.g. 

> covariance (mregr , ~yl + y2) <- list("v" , 0.3) 

If we are interested in fixing only the covariance parameters (and not the 
diagonal) we can add the pairwise=TRUE argument. For instance to specify 
that the covariances between Yi,Y 2 , and Y 3 are the same, we can call 

> covariance (mr egr, ~yl + y2 + y3, pairwise = TRUE) <- "rl" 

thus specifying a compound-symmetry structure. 

Finally a syntax like the one used by the regression-function can be 
used, such that 

> covariance (mregr , c(yl, y2) ~ y2 + y3) <- list (0.5, "r" , "rO" , 
+ 0.3) 
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defines the following covariance structure between residual terms ei,e 2 ,e 3 
corresponding to Yi,Y 2 ,Y 3 (with the • denoting elements that are not af- 
fected by the call): 



0.5 r 

Var((e 1 ,£ 2 ,£ 3 ) , )= | 0.5 r 0.3 | (31) 

r 0.3 

As with intercept and slope parameters, we can remove covariance con- 
straints by fixing parameters to the value NA. We end up with a model as 
defined by the path-diagram in Figure [31 

w > 1 ° 




Figure 3: plot(mregr, labels=TRUE, diag=TRUE) 



3.2.3. Constraining intercepts 

To fix the intercepts of the three outcomes to be identical, we can write 

> intercept (mregr, ~yl + y2 + y3) <- "mu" 

Instead of a parameter name, mu, we could have chosen a numerical, say 0. 

Notice that a character vector could also have been given instead of the 
formula. The value on the right can be given as a list, hence to fix the 
intercept of Y"i and Y 2 to be identical and Y 3 to zero, we could call 

> intercept (mregr, ~yl + y2 + y3) <- list("mu" , "mu" , 0) 

3.3. Simultaneously specifying constraints on intercepts, slopes and vari- 
ances 

Using the formula syntax with the regression method it is possible to 
simultaneously specify constraints on intercept, regression and covariance 

16 



parameters. A special function, f, can be used within the formula to specify 
the slope parameters, and further a pair of square brackets can be appended 
to each variable in the formula. Inside the square bracket the intercept of 
the variable can be defined, or alternatively both the intercept and residual 
variance separated by a colon. As an example we can specify 

Y 1 = a + bU + e 1 , ei ~AT(0,l), 
Y 2 = a + bU + e 2 , e 2 ~Af(0,v), 
U = b 2 X + c, 

using the square-bracket syntax: 

> m.2 <- lvm() 

> regression (m2) <- c(yl[a:l], y2[a:v]) ~ f(u[0], b) 

> regression(m2) <- u f(x, b2) 

An equivalently even more compact model specification can be obtained 
using a list of formulas with the model initializer lvm, hence an equivalent 
way of specifying m2 would be 

> m2 <- lvm(list(c(yl[a:l] , y2[a:v]) ~ f(u[0], b) , u ~ f(x, b2))) 

If the index of the parameter is known (see the coef method below) the 
parf ix method can also be used to simultaneous constrain parameters. For 
example to fix the parameters of a model m, at positions 1, 4 and 5 to the 
values a, a and 1, we can call 

> parfix(m, c(l, 4, 5)) <- list("a", "a", 1) 

3.4- Random slopes 

Random slope effects (i.e. the matrices Vi and Wj) can be defined 
by constraining the slope parameter of a latent variable to the name of a 
covariate. The covariate does not necessarily need to be added to the model 
explicitly, as the slope parameters are matched to the column names of the 
data. frame that is used during estimation (see Section |6]). 

In a standard structural equation model covariates enter the model in 
order to describe differences in the mean structure. Another question might 
be whether covariates can influence the variation in an outcome. In a sit- 
uation where the variance depends on a covariate, the usual assumptions 
of variance homogeneity of the latent variables are not met and the model 
is not a SEM. In the LLVM this situation can be handled by defining a 
regression on the variance component using the random slope specification. 
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Another important example is to use the random slopes to describe varia- 
tion in longitudinal studies, and in combination with measurement models 
this allows us to formulate random regression models taking measurement 
error into account (see Figure HJ). 




Figure 4: Longitudinal analysis with measurement models. Here illustrated with three 
time points where 771, 772, 773 are modeled by a random intercept, 1, and a random slope 
defined by the covariate z,-, i = 1,2,3 and the random effect nf. rji — 1+ KiZi +Q. As the 
77's are only indirectly observed a measurement model is employed at each time point. 



3.5. Non-linear constraints and effects 

Non-linear parameter constraints are defined using the constrain<- 
function. The syntax is 

> constrainfrn, formula) <- function(x) . . . 

where m is the lvm-object, and the left-hand side in the formula specifies 
the parameter that is a (non-linear) function of the parameters or covariates 
defined by the right-hand side, and the function defines this association. 
The result of the function can optionally be given the attributes grad, defin- 
ing the analytic gradient, and inv, defining a monotonic transformation 
(typically the inverse function) used in conjunction with confidence limit 
calculations (see Section [6] on Statistical Inference). 
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As an example we will define the model 



J = 1,2 



(35) 



with Cov(£ji,£i2) = 0, Var(e#) = i»: 

> mconstr <- IvmO 

> regression (mconstr, c(yl, y2j ~ <- list ("bl" , "b2") 

> intercept (mconstr , ~yl + y2) <- "mu" 

> covariance (mconstr , ~yl + y2) <- "v" 

To restrict the variance-parameter to live on the positive real axis, we add 
the constraint 



> constrain (mconstr , v ~ alpha) <- exp 

using a log-link, and the parameter alpha (log-variance) is added to list 
of model parameters. In this example we do not set any attributes and 
numerical derivatives of the constraint will therefore be used (based on 
the package numDeriv) during estimation. Notice that constraints on the 
variance parameters can be set automatically by the estimate-function (see 
Section [6]). For general domain constraints the function range. lvm can be 
used as the right-hand side argument, e.g. range . lvm(a=l ,b=Inf) will 
bound the parameter to the interval (1, oo). 

Continuing the example, we could define the intercept, /x, to be the 
product of the two slope-parameters, 

> constrain (mconstr , mu ~ bl + b2) <- prod 

Constraints can be removed by letting the RHS be NULL (or NA) 

> constrain (mconstr , "mu") <- NULL 

If we instead wish to add a non-linear effect of x on y\ and %j2- 



with $ denoting the standard normal cumulative distribution function, we 
can make the call 

> constrain (mconstr , mu ~ alpha + beta + x) <- function(x) x[l] + 




(36) 



with 




(37) 



+ 



pnorm(x[2] * x[3]) 
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3. 6. Complex models with feedback or co-existance of regression associations 
and covariance between residuals 

The linear latent variable model framework in principle allows pathways 
going in both direction between two variables, i.e. feedback, and also simul- 
taneous presence of both a regression association and covariance between 
residual terms of the same two variables. Note that both cases are different 
from a simple correlation between the residuals. 

A simple example of an identified model is a illustrated in FigureEl where 
the aim is to estimate the effect of Y on X while taking the unmeasured 
confounder C into account. This can be achieved using an instrumental 
variable, /, which by assumption must be (strongly) correlated with X, 
independent with the confounder, and conditionally independent with the 
Y given X. We can model the inflation in covariance between Y and X 
using a correlation between their residual terms, which can be implemented 
as 

> m <- lvm() 

> regression(m) <- y ~ x 

> regression(m) <- x ~ i 

> covariance (m) <- x ~ y 



Y 




Figure 5: Path diagram showing the association between the variables Y and X with 
unmeasured confounder C and instrumental variable /. 

4. Inspecting the model assumptions 

The philosophy of the lava package is to separate the model specification 
from the actual data, since examination of the model structure before actual 
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Function Task 



plot 


T ~. I 111 1 C J 1 11 

riot the graph oi the model 


regression 


Display parameter restrictions 


intercept 


Display parameter restrictions 


covariance 


Display parameter restrictions 


exogenous 


Extract (or define) exogenous variables (predictors) 


endogenous 


Extract set of endogenous variables (responses) 


latent 


T~l j_ j_ j_1*1j_ j_ *11 

Extract set of latent variables 


manifest 


Extract set of manifest (observed) variables 


vars 


Extract all variables 


children 


Extract children of a node 


parents 


Extract parents of a node 


coef 


Get list of parameters 


constrain 


Display non-linear constraints 


path 


Extract direct and indirect unidirectional 




pathways between nodes 


subset 


Extract sub-model 


merge,°/„+°/ 


Merge models 



Table 2: Model inspection functions. 



estimation is often an important aspect of modelling within this class of 
statistical models. The lava package includes several functions as an aid to 
obtain an overview of the model assumptions (see Table [2]). In general, the 
below described methods also applies for a lvmf it-object (see Section [6]). 

The plot-method (see Figure [1] and [3]) visualizes the model using a 
path-diagram, i.e. a graph structure where linear (causal) associations are 
shown with directed edges and covariance between residuals are shown as 
bidirectional edges. Manifest variables are shown as rectangles and latent 
variables as ellipsoids. The parameter constraints can be added as labels 
on the edges with the argument labels=TRUE and variance parameters can 
be added with the argument diag=TRUE (see Figure [3]). The plot-function 
will be explained in more details in Section [TUJ 

Via the summary function, a complete overview of the model and (iden- 
tity) parameter constraints can be obtained. Returning to the model mregr 
defined in Figure [3j 

> summary (mregr) 



Latent Variable Model 
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with: 6 variables. 
Npar=ll+2 

Regression parameters: 
yl y2 y3 x z w 
yl * 
y2 * 
y3 * 
x 1 2 bl 
z a b 1 
w 

Covariance parameters: 
yl y2 y3 w 
yl v 0.5 r 
y2 0.5 rO 0.3 
y3 r 0.3 * 
w * 
Intercept parameters: 
yl y2 y3 w 
mu mu * 

Here the adjacency-matrix for the graph of all the unidirectional edges of 
the path-diagram can be read off under the title Regression parameters (i.e. 
slopes). Similarly the covariance structure of the residual terms and the 
intercept structure are shown. An empty element indicates that there is no 
direct association. A star indicates a free parameter and all other entries 
are either fixed numerical values or parameter names as defined by identity 
constraints during model specification. These three matrices can also be 
extracted via calls to regression, covariance or intercept. 

The exogenous variables (covariates) of a lvm object can be identified 
with the exogenous function. Similarly a list of the endogenous (manifest) 
variables can be obtained with the function endogenous and the subset 
of these variables that do not predict other variables (top-level outcomes) 
can be shown by including the argument top=TRUE. In a similar way the 
observed and latent variables can be shown with manifest and latent. All 
variables of the model are listed with the vars function 

> exogenous (mregr) 

[1] "x" "z" 



> endogenous (mregr, top = TRUE) 
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[1] "w" 

The children and parents functions extracts the children respectively the 
parents of one or several nodes in the unidirectional graph of the model, 
e.g. 

> children (mregr, ~x + yl) 
[1] "yl" "y2" "y3" "w" 

> parents (mregr, ~w) 
[1] "yl" "y2" "y3" 

The pathways from one variable to another can be viewed with the path 
function which returns a list of character vectors indicating the (causal) 
path 

> path (mregr, w ~ x) 
[[1]] 

[1] "x" "yl" "w" 
[[2]] 

[1] "x" "y2" "w" 
[[3]] 

[1] "x" "y3" "w" 

The function subset can be used to extract subsets of a model. To extract 
the upper level of the path analysis we call 

> subset (mregr, ~yl + y2 + y3 + w) 

which keeps all parameter restrictions of the original model. Conversely, 
lvm models can be merged with the merge method (or using the operator 
syntax: m / ++ / m2) . 

To examine the parameters (and in particular their order) one can call 
the coef-function 

> coef (mregr) 

ml m2 pi p2 p3 p4 p5 

"yl" "w" "yl<-z" "y2<-z" "y3<-x" "w<-yl" "w<-y2" 

p7 p8 p9 plO pll 

"yl<->yl" "y2<->y2" "y3<->y3" "w<->w" "yl<->y3" 
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where "<-" represents slope parameters (e.g. z on yl) and "<->" repre- 
sents covariance (See also the describecoef function). With the argument 
labels=TRUE we can get the same vector but with all parameter labels 
substituted by their constraints 

> coef (mregr, labels = TRUE) 



ml 


m2 


Pi 


P 2 


p3 


p4 


p5 p6 


mu" 


"w" 


"a" 


"b" 


"bl" 


"w<-yl" 


"w<-y2" "w<-y3" 


P7 


p8 


P 9 


plO 


pll 






"v" 


"rO" 


"y3<->y3" 


"w<->w" 









The non-linear parameter constraints or non-linear regression specifications 
can be shown with 

> constrain (mconstr) 
$v 

function (x) . Primitive ("exp") 
attr ( , "args") 
[1] "alpha" 

$mu 

function (x) 

x[l] + pnorm(x[2] * x[3]) 

attr ( , "args") 

[1] "alpha" "beta" "x" 

5. Simulation 

Simulation is a major component of modern statistics, which allows us 
to experimentally study the properties of a statistical method under various 
alternatives and to verify (or reject) preliminary ideas. The lava package 
includes the sim method offering a convenient tool for performing simulation 
studies from very general models. 

As an initial example we will create a data. frame with 100 observations 
from the structural equation model ml defined in Section 13.11 

> mydata <- sim(ml, 100) 

The default parameter values are that all intercepts are 0, slope and residual 
variance parameters are 1, and covariance parameters are 0.5. To change 
the simulation parameters one can either fix the relevant parameters of the 
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Function Task 



sim Simulation method for lvm-objects 
functional, constrain Introduce non-linearities in simulation 

distribution Change distribution and link of variables 

heavytail Define heavy tailed distribution of a variable 

normal . lvm Normal distribution 

poisson.lvm Poisson distribution 

binomial. lvm Binomial distribution 

uniform. lvm Uniform distribution 

weibull.lvm Weibull accelerated failure times 

Table 3: Simulation methods. 

model to the desired numerical values as described in the previous section, 
or give the parameters as the argument p directly to the sim method. For 
instance the following two calls will both simulate 1000 observations from 
the model mregr with the default parameter values, except that the residual 
variance of w is set to 2, the intercepts of Y\ and Y 2 (/i) are set to 1, and 
p = 1.5 (the slope of Y l on W) 

> d. mregr <- sim(mregr, 1000, p = c(mu = 2, beta =1.5, ~w<->w~ = 2)) 

> d.mregr2 <- simfrnregr , 1000, p = c(yl = 2, ~w<-yl~ =1.5, ~w<->w~ = 

To simulate data with heavier tails than the normal distribution the heavy- 
tail method can be used. The following defines yl and y2 to be realizations 
from an unstructured bivariate normal distribution Af(fi, E): 

> mhtail <- lvm() 

> covariance (mhtail) <- yl ~ y2 

We let Y be a stochastic variable with this distribution, then the following 
call 

> heavytail (mhtail, df = 3) <- ~yl + y2 

will allow us to draw simulations of yl and y2 from the distribution of 
Y}(3/(5) ' 5 where Q ~ xh i- e -> leading to a multivariate t-distribution with 
covariance matrix E, mean //, and v — 3 degrees of freedom, described by 
the density 

J[x | n, E, v) - 



^I>/2)r(l/2)* (! + i (aj _ M )/ S -i (a . _ Ai) )^)/ 2 

(38) 
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where k = 2 is the dimension. The same realization of Q will be used on 
both yl and y2 in the above code. To make simulations where a different 
realization of the ^-distribution is used for each outcome (leading to a star- 
shaped distribution), one simply has to make separate heavytail calls as 
in 

> heavytail (mhtail, df = 3) <- "yl 

> heavytail (mhtail, df = 3) <- ~y2 

The method can be used with different degrees of freedom for different 
variables in the model, and thus gives access to an easy way of simulating 
models with various degrees of outlier contamination. 

To allow simulations from quite general models, two additional replace- 
ment functions are available, functional and distribution. The func- 
tional replacement function is used for defining (nonlinear) functional re- 
lationships between variables and has the syntax 

functional(x,to,from) <- value 

where x is a lvm-object, from and to are predictor and outcome, respec- 
tively, and value is a real function describing the functional form. In the 
model mregr (Figure [3]) we have 

E(Y 3 \X,Z) = hX + Z (39) 

With the call 

> functional (mregr , y3 ~ x) <- function(x) x'2 
we can simulate from the model 

K(Y 3 \X,Z) = b 1 X 2 + Z (40) 

with the coefficient b± defined by the earlier identity constraint. To define a 
more complex polynomial effect of X we can make a copy of the predictor 
with the copy function and apply functional on the copy 

> copy (mregr) <- x x2 

> regression (mregr , y3 ~ x2) <- "b2" 

> functional (mregr , y3 ~ x2) <- function(x) x~3 

leading to the mean-structure 

E{Y 3 \X,Z) =b 1 X 2 + b 2 X 3 + Z (41) 



An alternative approach is to use the constrain function, e.g. 
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> functional (mregr, y3 ~ x) <- NA 

> kill (mregr) <- ~x2 

> intercept (mregr , ~y3) <- "y3" 

> constrain (mregr , y3 ~ bO + b2 + b3 + x) <- function(x) x[l] + 
+ x[2] * x[4]~2 + x[3] * x[4]~3 

would define the model 

E(Y 3 \X, Z) = b + bxX + b 2 X 2 + b 3 X 3 + Z (42) 

The major difference between the two methods is that functional only has 
an impact on the sim method and not on the inferential methods, whereas 
constrain alters the model fitted by estimate (see Section [6]). 

The distribution replacement function is used for defining the link/ dist- 
ribution of variables, with syntax 

distribution(x, variable) <- value 

where x is a lvm-object, variable is the variable to define the distribution 
of, and value is a function defining the random generator for variable 
taking the form 

function(n, mu, var) 

where n defines the number of samples, mu is the mean, and var is the vari- 
ance as defined by the latent variable model. Some of the most common dis- 
tributions have been predefined in the functions uniform, lvm, normal . lvm, 
binomial . lvm, poisson.lvm, weibull.lvm. 

As an example we will define a simple hierarchical model structure (path 
analysis) 

> msim <- lvm(t y + u + z) 

> regression (msim) <-y~u+x+z 

> regression (msim) <- c(z, u) ~ x 

To change the distribution of Y to a Bernoulli distribution we simply 

call 

> distribution (msim, ~y) <- binomial .lvm () 
The default link is logit 

pry - 1 1 u x z) - *Mh±M±NL±Ml 

P(Y - 1 | U, X, Z) - 1 + exM + PiX + p2U + p3Z y (43) 

but a complementary log-log (cloglog) or probit link can be chosen via the 
link argument, e.g. 
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Figure 6: plot(msim) 



> distribution (msim, ~y) <- binomial.lvm("probit") 
Similarly we can define the conditional distribution of Z as Poisson 

Z rsj pois(E(Z | X)), (44) 
log(E(Z|X))= 7o + 7iX, (45) 

and the conditional distribution of U as uniform 

U = A + X 1 X + sflU Q , (46) 
U ~unif (-1,1)V3, (47) 

and let T follow a proportional hazards model with Weibull baseline with 
scale parameter 1.25 and shape parameter 2 (and no censoring) 

\(t) = \™ eibull (1.25; 2) exp(-a + ot x U + a 2 Y + a 3 Z). (48) 

> distribution (msim, ~z + u + t) <- list (poisson. lvm() , uniform. lvm() , 
+ weibull .lvm(l .25 , 2, cens = Inf)) 

The default simulation parameter values leads to intercepts (3q = A = 
7o = cto = and all the remaining parameters are 1 (including the residual 
variance afj). 

The default distribution of exogenous variables is the standard normal 
distribution, X ~ A/"(0, 1) (and independence between the exogenous vari- 
ables). It is also possible to let a variable be deterministic by simply assign- 
ing a list encapsulating the data: 

distribution (msim, ~x) <- list (seq(0 , 1 , length. out=100) ) 
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Obviously an error will occur if this variable definition differs in length from 
the number of samples to be drawn in the simulation. 

Parameters can be altered in the same manner as endogenous variables 
(e.g. sim(msim, p=c(x=3, 'x<->x <= 2) )), but they can also be fixed di- 
rectly via the function distribution, e.g. 

> distribution (msim, ~x) <- binomial . lvm(p = 0.4) 

defines X to be simulated from a Bernoulli-process (F(X = 1) = 0.4), with 
similar options for the other methods (e.g. poisson. lvm(lambda=2)). 

For a lvmf it-object (see Section [6]) the sim method by default simulates 
observations with the parameter vector p set equal to the estimated param- 
eter vector and with the same number of observations as in the original data 
set. In this case the defaults causes the simulations to be drawn from the 
conditional distribution given the exogenous variables, hence the exogenous 
variables will be fixed at their original values (can be changed with the ar- 
gument xf ix=FALSE, in which case the covariates are simulated with mean 
and variance parameters set to the empirical mean and covariance). 

6. Inference 



Function 


Task 


estimate 


Estimates parameters of lvm-objects 


effects 


Calculates direct and indirect effects 


compare 


Likelihood ratio, Wald and Score tests 


gof 


Goodness-of-fit measures 


logLik 


Extracts individual Likelihood values 


information 


Extracts information matrix 


score 


Extracts individual contribution to score 


bootstrap 


Non-parametric bootstrap 


conf int 


Calculates confidence limits (Wald and profile) 


constraints 


Extracts non-linear parameter constraints 


modelsearch 


Model searching via score tests 


equivalence 


Finds empirically equivalent models 



Table 4: Inferential tools. 



Parameter estimation is achieved with the estimate-function which returns 
an object of class lvmf it. The syntax of the estimation procedure is: 
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estimate(x, data, estimator, control=list () , 
missing, weight, cluster, fix, ...) 

with the following arguments 

x: A lvm-object. 

data: data, frame with variables with the same names as the observed vari- 
ables of the lvm object: manifest (x). The order of the variables is 
not relevant. Alternatively, for models where the empirical mean and 
variance are sufficient statistics (e.g. structural equation models), a 
list with the observed covariance matrix (S), the observed mean mu 
and the number of observations n can be passed ffuments, e.g. 
data=list (S=cov(mydata) ,mu=colMeans (mydata) ,n=nrow(mydata) ) . 

estimator: Choice of estimator. Default is gaussian which is the MLE of 
the model defined by d3J) and (J4]). See also Section [9j 

control: A list of parameters controlling the estimation and optimization 
procedures. See below for more details. 

missing: Logical variable. If FALSE (default) a complete case analysis is 
performed and else a full information likelihood analysis is conducted 
under a MAR assumption. Note that observations with missing co- 
variates are excluded. 

weight: Optional weight matrix to be used by the estimation procedure 
defined by estimator (for multigroup analyses this should be a list of 
weights). 

cluster: Optional cluster variable identifying correlated groups of obser- 
vations in the data set (for multigroup analyses this should be a list 
of cluster variables) to be used in the calculation of robust sandwich 
variance estimates. 

fix: Logical variable (defaults to TRUE). Care has to be taken when specify- 
ing a SEM in order to obtain an identifiable model. As a rule of thumb 
one regression coefficient in each measurement model should be fixed, 
e.g. to 1, and the intercept of one the indicator set to 0. By default the 
model is altered to fulfill these requirements unless f ix=FALSE. Other 
parameterizations can be selected by setting the option param via 
the lava. options function. The above mentioned parameterization 
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is obtained with lava. options (param=" relative") and loading pa- 
rameters and intercepts are then interpreted as differences compared 
to the reference indicator. Setting param="absolute" will result in a 
parameterization where the variance of the latent variables are fixed 
to 1 and the intercept to if not already fixed at some other values, 
lava. options (param="hybrid") alters the model such that the in- 
tercept of latent variables are set to 0, and one factor loading in each 
measurement model is set to 1. 

Calling lava. option(param="none") has the same effect as fix=FALSE 
(in which case the user has to manually define parameter constraints 
that guarantee model identification). 

The optional control argument must be a list of parameters for the opti- 
mization procedure. The element method should be a string pointing to a 
generic optimizer conforming to the syntax of stats: :nlminb (the default 
optimizer), i.e. accepting the objective function (e.g. the log likelihood), the 
gradient, the Hessian and control parameters. Setting method="nlminbO" 
will only use the objective function during optimization, where method= 
"nlminbl" also uses the gradient, and method="nlminb2" the Hessian as 
well. Defining method="NR" will use an alternative Newton- Raphson algo- 
rithm. Variance parameters are as default modeled using a log-link. This 
can be disabled by setting constrain=FALSE. Additional options like the 
number of iterations (iter. max), turning trace information on (trace=l), 
starter function (starterfun (a string pointing to a function generating 
starting values for the optimization), convergence tolerance (tol), reduc- 
tion of step-size of the "NR" optimizer (gamma), etc. can also be defined (see 
the R help page of nlminb and estimate). 

The control parameters can be set globally via the function lava . options. 
For instance to turn on the trace information of the optimizer as default in 
the current session, we would submit lava. options (trace=l) . 

We demonstrate the procedure in the ongoing example (see Figure [2]) 
with the data obtained from the simulation in Section [5j using the hybrid 
parameterization: 

> lava. options (param = "hybrid") 

> e <- estimate (ml , mydata) 

> summary (e) 
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Latent variables: ul u2 
Number of rows in data=100 
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Residual Variances: 
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Estimator: gaussian 



Number of observations = 100 
Log-Likelihood = -1010.407 
BIC = 2167.944 
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AIC = 2066.814 

log-Likelihood of model = -1010.407 
log-Likelihood of saturated model = -1003.438 

Chi-squared statistic: q = 13.93726 , df = 16 , P(Q>q) = 0.6033881 



The parameter estimates in the output are divided into slope parameters be- 
longing to the measurements part of the model (Measurements) (i.e. factor 
loadings), the structural part (Regressions) and the intercepts and residual 
(co)variances of the model. The summary method outputs all parameters of 
the model including the parameters that were fixed in contrast to the print 
method that only outputs the canonical parameters of the model. Notice 
that a regression parameter in each of the measurement models has been 
fixed to one in order to identify the model, and the slope parameter of u 2 
on U\ is therefore interpreted on the scale of Y\\ and Y 2 \. The standardized 
coefficients in the last column are interpreted as the change in standard 
deviation of the outcome when increasing the predictor one standard devia- 
tion. These parameters can be used to compare effects of predictor variables 
measured on different scales. If non-linear constraints were defined then the 
relevant estimates and approximate standard errors will be shown in the 
last part of the summary output. These effects can also be extracted with 
the constraints function. 

The p-values for the variance parameters are deliberately omitted from 
the output as the asymptotic distribution under the null is non-standard 
(derived in special-cases such as the random intercept model as an equal 
mixture of a Xi _ distribution and the Dirac measure in 0. Hence the naive 
p-values based on a Xi-distribution will tend to be conservative). Different 
versions of the information matrix can be used, via the argument typeG 
{"E",''hessian", "outer", "robust"} (expected information (default), mi- 
nus the second derivative of the log- likelihood , outer product of the score, 
and the robust /sandwich variant ( 1Whitd . ll982l )). to calculate the asymptotic 
standard errors of the parameters (see also the information function). 

The last part of the output includes some fit criteria (Akaike and Bayesian) 
and the omnibus goodness-of-fit x 2_ test which is a likelihood ratio test of 
the current model against the saturated model structure. In the conditional 
model formulation, the least restrictive model allows covariance between 
residuals of all endogenous variables and the mean-vector /x to be a gen- 
eral linear combination of all the covariates. In the unconditional model 
formulation (Tl5j) . this corresponds to a completely free mean structure and 
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a covariance S that can be any symmetric positive definite matrix. Clearly 
the maximum likelihood is attained at the sample mean and non-central 
empirical covariance matrix, S. Hence the log-likelihood of the saturated 
linear model is 

~ (k]og(2*) + Iog(|5|) + ^(p + q)j . (49) 

This part of the output can also be obtained with the call 

> gof(e, chisq = TRUE) 

Number of observations = 100 
Log-Likelihood = -1010.407 
BIC = 2167.944 
AIC = 2066.814 

log-Likelihood of model = -1010.407 
log-Likelihood of saturated model = -1003.438 

Chi-squared statistic: q = 13.93726 , df = 16 , P(Q>q) = 0.6033881 

RMSEA (90% CI): (0;0. 08127) 
rank (Information) = 23 (p=23) 
condition(Inf ormation) = 0.007602741 
I | score | | ~2 = 1.306582e-08 

Here the omnibus test gives a p-value of 0.60, thus indicating a reasonable 
agreement between the model implied and empirical covariance structure. 

Byproducts of the maximum likelihood estimation such as the score, 
information and log-likelihood can be obtained with the functions score, 
information (see also vcov), and logLik. The individual contribution to 
score and log-likelihood are calculated with the argument indiv=TRUE. The 
methods also allow altering the parameter, data, weights and type of esti- 
mator (arguments p, data, weight and estimator). Predictions can be ob- 
tained via the predict method (and conditional residuals via residuals), 
where the latent variables are predicted by the conditional expectation given 
all manifest variables (the prediction can be based on conditioning on a sub- 
set of the manifest variables defined by the second argument of predict). 
For instance to estimate E(C/i | Y 1 ,Y 2 ,Y 3 , Xi, X 2 ) we can call 

> ulhat <- predict (e, ~yl + y2 + y3) [, "ul"] 

Prediction of the residual terms can be obtained by setting the argument 
residual=TRUE. 



> zetalhat <- predict(e, ~yl + y2 + y3, residual = TRUE) [, "ul"] 
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Assessment of linearity and distributional assumptions can be based on 
examination of different residuals and their association with for exam ple 
covariates in the model. We refer to the gof package (iHolstl . l2012h for 
residual based goodness-of-fit methods for structural equation models fitted 
with lava. 

6.1. Direct and indirect effects 

One of the strengths of the structural equation model framework is the 
possibility of decomposing the effects of a predictor into direct and indirect 
effects. In the model ml (Figure [2]) we have the following relations 



Z3 — X23U2 + 623, 

U 2 = 1 X 1 + 2 X 2 + >yU 1 + Z 2 , 

U X = 5 1 X X + 8 2 X 2 + £ U 



(50) 
(51) 
(52) 



and we wish to quantify the effect of X% on Z 3 . The direct effect is zero as 
Xi is not present in (|50|) . By substituting (l5"Tj) and ( 152]) into (!50|) . it follows 
that 



$1X23X1 + ^2^23^2 + /32A23-^2 + ^17^23^1 

+ <5 2 7A 23 ^2 + 7A236 + A23C2 + e 2 3- 



(53) 
(54) 



Hence the total effect of X\ is the sum of the two indirect effects /3iA 2 3 and 
^17^23- The estimation uncertainty of this effect can be approximated by 
the delta method. In general the distribution of a product of estimators can 
be approximated in the following way. Let 



/G§) =/(&,...,&) =na 



(55) 



8=1 



where the estimated parameters (3 are asymptotically normally distributed 
with covariance matrix S^. Now 



V/G9) 



Vn^ fc ^ J 



(56) 



and 



n [f((3)-f((3 



v 
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(57) 



The approximate distribution of linear combinations of products is obtained 
by straightforward calculations (i.e. V/i + V/2). 

The effects function can be used to estimate the (direct and indirect) 
effect of one variable on another together with approximate standard errors, 
e.g. 

> (f <- effects (e, z3 ~ xl)) 

Total effect of 'xl' on 'z3': 

1.823569 (Approx. Std.Err = 0.1524713) 
Direct effect of 'xl' on 'z3': 

(Approx. Std.Err = NA) 
Indirect effects: 

Effect of 'xl' via xl->ul->u2->z3 : 

1.077448 (Approx. Std.Err = 0.2170743) 
Effect of 'xl' via xl->u2->z3: 

0.7461208 (Approx. Std.Err = 0.222262) 

> coef(f) 



6.2. Hypothesis testing 

Next we estimate the parameters of two nested models where we have 
restricted all factor loadings to 1 and in addition removed the effect from 
Ui to U 2 

> mla <- ml 

> regression(mla, c(zl, z2, z3) ~ u2) <- 1 

> regression(mla, c(yl, y2, y3) ~ ul) <- 1 

> mlb <- mla 

> cancel (mlb) <- u2 ~ ul 

> ea <- estimate (mla, mydata) 

> eb <- estimate (mlb, mydata) 



z3<-u2<-ul<-xl 
z3<-u2<-xl 



Total 



Direct 



Estimate Std.Err z value Pr(>|z|) 

1.8235689 0.1524713 11.960083 0.000000e+00 

0.0000000 NA NA NA 

1.0774481 0.2170743 4.963498 6.923458e-07 

0.7461208 0.2222620 3.356943 7.880941e-04 
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6.2.1. Likelihood Ratio Test 

For nested models, Aii C M.2-, the natural test is the likelihood ratio 
test (LRT) 



\og LAOA- log L 2 (0 2 ) 



approx. 2 
~ XAdf 



(58) 



For non-nested models, one choice is the Akaike's Information Criterion 
(AIC) favoring models with low values of 



AIC = -21og(L) + 2n 



par j 



(59) 



where n par is the number of parameters in the model (implemented in the 
AIC function), or the Bayesian Information Criterion 



5/C = -21og(L) + n par log(iV), 



(60) 



where N denotes the total number of observations (jRaftervl . 119 93). i.e. the 
number of endogenous variables times the number of individuals. 
Successive LRT between nested models can be calculated with 

> (LRT1 <- compare(e, ea, eb)) 
[[1]] 

Likelihood ratio test 



data: 

chisq = 6.6542, df = 4, p-value = 0.1553 
sample estimates: 

log likelihood (model 1) log likelihood (model 2) 
-1010.407 -1013.734 



[[2]] 

Likelihood ratio test 

data: 

chisq = 33.9868, df = 1 , p-value = 5.549e-09 
sample estimates: 

log likelihood (model 1) log likelihood (model 2) 
-1013.734 -1030.728 
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Hence we accept the hypothesis that all factor loadings are equal (the model 
mla) but reject the hypothesis that the two latent variables U\ and U2 are 
conditional independent given the covariates. 

6.2.2. Wald Test 

The compare method can also deal with hypothesis testing via Wald or 
Score tests. The hypothesis 

H :{3 = (3 (61) 

for a subset, (3, of all the parameters, can be tested with a Wald test using 
the par and null arguments (the latter defaults to 0), for instance to test 
if all loading parameters are 1 (equivalent to the LRT of ml against mla), 
we can write 

> (Wl <- compare(e, par = c("y2<-ul", "y3<-ul" , "z2<-u2" , "z3<-u2") , 
+ null = rep(l, 4))) 

Wald test 

data: 

chisq = 7.2687, df = 4, p-value = 0.1223 

For a general estimable contrast C, we can also test the hypothesis 

H :C[3 = P , (62) 

where C is matrix (or vector) with the same number of columns as the 
number of parameters, or alternatively a sub-matrix with column names 
given by parameter names (implicitly assuming that omitted columns are 
zero), leading to the test statistic 

(Ci§- A,)'(CE^C7')- 1 (C j 8- A) ~ xL k (c) (63) 

The covariance matrix will by default be the variance matrix as defined 
by the chosen estimator (for the linear gaussian models, estimator="gaussian", 
this is the inverse of the expected information), but it can optionally be 
given as the argument Sigma, e.g. to use a robust variance estimate (see 
information method). 

To test whether all the intercepts of the outcomes sum to zero, we can 
write 
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> C <- rep(l, 6) 

> names (C) <- endogenous (ml) 

> (W2 <- compare (e, contrast = O) 

Wald test 

data: 

chisq = 0.1769, df = 1 , p-value = 0.6741 

hence we accept the hypothesis of equal intercepts. 
6.2.3. Score Test 

With the scoretest argument we can conduct Score tests. Letting 
be the parameter belonging to the less restrictive model Ai 2 , which is equal 
to 0i, the MLE of the restrictive model Aii, for all the parameters shared 
by M 2 and zero elsewhere. The test statistic is then given by 

S = S 2 {0)'T 2 l {0)S 2 {0) (64) 

with approximate XAd/~distribution under the null, where S 2 and T 2 are the 
score and information matrix of model Ai 2 . 

We will test whether adding correlation between the residuals terms of 
Z 3 and Z 2 significantly improves the model fit: 

> (SI <- compare (e , scoretest = z3 z2)) 

Score test 
data: z3 ~ z2 

chisq = 0.1186, df = 1 , p-value = 0.7306 

which does not indicate evidence against the conditional independence as- 
sumption. 

Similarly we can test the statistical significance of simultaneously adding 
two extra correlation parameters: 

> (S2 <- compare (e, scoretest = c(z3 ~ z2, zl ~ z2))) 

Score test 



data: z3 z2 zl z2 

chisq = 2.03, df = 2, p-value 



= 0.3624 
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6.2.4- Model searching with the Score test 

An advantage of the Score test over the LRT is that the MLE is only 
needed in the more restrictive model making it an ideal instrument for model 
searching, in order to check that important aspects of the covariance struc- 
ture has not been neglected in the model specification. The modelsearch 
function can be used to examine all one-parameter extensions of the model. 
The following call give the 5 most significant one-parameter extensions 

> print (ms <- modelsearch (e) , tail = 5) 

Score: S P(S>s) Index holm BH 

2.024 0.1549 u2<->yl 1 0.9031 

2.124 0.145 yl<->zl 1 0.9031 

2.283 0.1308 yl<->y2 1 0.9031 

2.892 0.08904 y2<->z2 1 0.9031 

4.207 0.04025 y2<->zl 1 0.9031 

As expected we do not see any significant improvements of the model among 
the 5 most significant Score tests (with the first and second column being 
the test statistic and corresponding p-value, and the last two columns be- 
ing the p- values adj usted by the Bonferroni-Holm procedure to control the 
overall Type I error f lHolml . ll979h . and q- values of the Benjamini-Hochberg 



procedure controlling the FDR). Similarly, the most important fc-parameter 
extensions to the model, can be examined with the argument k, but the 
number of models to search through will increase dramatically with k. 

6.3. Model equivalence 

A challenge in multivariate modeling is the problem of equivalent mod- 
els, where two different parameterizations leads to identical model fit (likeli- 
hood) for all data sets. Hence without strong a priori knowledge of the model 
structure, e.g. based on other scientific evidence, the interpretation of model 
parameters mus t be made ca utiously Formal proofs of model equivalence 



can be difficult (IBollenl . 119891 ) . To identify candidates of equivalent models 
we suggest using the Score test. The idea is to study all one-parameter 
extensions of a given model using the score test. Two models are said to be 
empirically equivalent if the score tests agree. This can be achieved with the 
equivalence function. Two variables of the model are chosen, which not 
necessarily are defined as being directly related in the model structure. The 
score function for the model including covariance between the residuals of 
the two selected variables is then compared with score functions of models 
omitting this association, but with the same number of parameters. 
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As a simple example we will investigate the structural equation model 
in the path diagram of the left panel of Figure [7J 



> mEq <- lvm(list(c(yl, y2, y3) u, u x)) 

> latent (mEq) <- ~u 

> covariance(mEq) <- yl ~ y2 

> dEq <- sim (mEq, 100) 

> est. mEq <- estimate (mEq, dEq) 

Below we are examining whether the inclusion of a residual correlation 
between Y\ and Y% has any equivalent formulations 

> (Eq <- equivalence (est .mEq, yl ~ y2)) 

In fact, an equivalent model is defined by instead adding a direct effect of 
X on Kj (see Figure [7]). 

0) yl<->y2 (10.31) 
Empirical equivalent models: 

1) y3<->x (10.31) 
Candidates for model improvement: 

none 




Figure 7: Example of two equivalent models (mEq). 



6.4- Confidence limits 

Wald confidence limits can be created using the method conf int. How- 
ever, for some parameters better coverage can be achieved with alternative 
methods. One method is the non-parametric bootstrap which can be cal- 
culated with the function bootstrap. The bootstrap is a computational 
intensive method, and parallel computa tion can be done by registering a 
foreach (jREvolution Computingl . 120091 ) parallel adaptor. In this exam- 
ple we will compute the bootstrap in parallel using the parallel and 
doParallel packages distributing the bootstrap computations across the 
available CPU cores 
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> library (doParallel) 

> regi s t erDoParal 1 el () 

> (B <- bootstrap (e, 500)) 

Non-parametric bootstrap statistics (R=500) : 
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To bootstrap other statistics a user-defined function can be supplied as 
the argument fun. A parametric bootstrap can be computed setting the 
argument parametric=TRUE and setting the argument p to parameter values 
of the null model from which to simulate from. The parallel computation 
functionality can be disabled via the call lava, options (parallel=FALSE). 

As an alternative to the resample-based approach, we can also calculate 
the confidence limits based on the profile likelihood: 

> (ci <- confint(e, profile = TRUE, parm = "u2<-ul", level = 0.95)) 



2.5°/, 97.5 % 
u2<-ul 0.6392854 1.332559 
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where the parameter of interest can be given as the index or label name. 

A third option is to use a variance stabilizing transform of the param- 
eter. As an example we will calculate the confidence limits of the partial 
correlation (or conditional correlation) between two outcomes Y\ and Y 2 
given covariates X. Hence we assume that 



and aim to estimate the correlation, p, between S\ and 82- 

We define this model with a single covariate and simulate some obser- 
vations and find the corresponding MLE: 

> m <- lvm(c(yl, y2) ~ x) 

> covariance(m, yl ~ y2) <- "C" 

> covariancefrn, ~yl + y2) <- list("vl", "v2") 

> d <- simfrn, 100) 

> e.pcor <- estimate (m, d) 

Note with the default parameter values the correlation between e\ and e 2 is 
0.5. Next we define the correlation parameter using a non-linear parameter 
constraint and obtain the estimate with confidence limits based on the delta 
method 

> constrain (e .pcor , rho C + vl + v2) <- function(x) x [1] / (x [2] * 
+ x[3])~0.5 

> constraints (e .pcor) 

Estimate Std. Error Z value Pr(>|z|) 2.5% 97.5% 

rho 0.4449733 0.08019988 5.548303 2.884553e-08 0.2877844 0.6021621 

Near the boundary of the parameter space these limits will tend to perform 
poorly and a better approach is to apply the variance stabilizing arctanh 
transform (Fishers z-transform) : 



Here we also supply the analytical gradient (optional) calculated with the 
chain-rule and in addition we set the attribute inv defining the inverse 
transformation, thus giving us the confidence limits on original correlation 
scale: 



Y t = frX + £i , i = l,2 



(65) 




(66) 
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> constrain (e .pcor, z ~ C + vl + v2) <- function(x) { 
+ f <- function(p) p[l] /sqrt(p[2] * p[3]) 

+ res <- atanh(f (x) ) 

+ df <- function(p) c(l/sqrt(p[2] * p[3]), -f(p)/(2 * p[2]), 

+ ~f(p)/(2 * p[3])) 

+ datanh <- function(r) 1/(1 - r~2) 

+ attributes (res) $grad <- function(p) datanh (f(p)) * df(p) 
+ attributes (res) $inv <- tanh 
+ return (res) 
+ } 

> constraints (e .pcor) 

Estimate Std. Error Z value Pr(>|z|) 2.5% 97.5% 

rho 0.4449733 0.08019988 5.548303 2 . 884553e-08 0.2877844 0.6021621 

z 0.4784149 0.10000000 4.784149 1.717133e-06 0.2824185 0.6744113 

inv(z) 0.4449733 NA NA NA 0.2751420 0.5878741 



In fact ^Z(p n ) -4 JV(0, 1) as n — > oo (jLehmann and Romanol . 120051 ) . 



Note that the correlation method calculates the correlation coefficients 
of a lvmf it object in a mor e elega nt way and with a slightly more precise 



variance estimate flHotellingl . Il953l ) (see also the partialcor function) 



7. Multigroup models 

Multigroup analysis ( 131|) can be used to combine different models linked 
via some shared parameters. Among other things this extension can be 
useful in testing general hypotheses of linear interactions, and the lava 
package supports this generalization via the estimate-function taking a list 
of lvm-objects and a list of data. frame's as arguments and returning an 
object of class multigroupf it: 

> estimate (list (ml , m.2, m3, ...), list(dl, d2, d3, ...), ...) 

The list of lvm objects can optionally be named, as in the example be- 
low, to enhance the output. Parameters that are shared across the models 
ml,m2,m3,. . . will be also be shared in the multigroup analysis, whereas all 
other parameters will be estimated independently between the groups. In 
many applications the first argument will therefore be repetitions of the 
same lvm-object. Note, that when the different datasets are defined from 
a single data. frame using a grouping variable, the function split can be 
applied to define the second argument. A typical multigroup analysis call 
will therefore resemble 
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> estimate (rep (m, n) , split(d, d$x)) 



where the data-frame d here is split into a list defined from the variable d$x 
(with, in this case, n distinct values). 

As an example we will create two nearly identical lvm-objects describing 
simple factor models (see Figure [8]): 

> mgl <- lvm() 

> regression (mgl , Yl H) <- 1 

> intercept (mgl , ~Y1) <- 

> regression(mgl) <- c(Y2, Y3) ~ H 

> regression(mgl) <- H ~ E 

> latent (mgl) <- ~H 

> mgl <- baptize (mgl) 

> covariance(mgl , endogenous (mgl)) <- NA 

> mg2 <- mgl 

> intercept (mg2, ~Y2 + Y3) <- 

The baptize function labels all free parameters of the model, giving the 
parameter the names as defined by the coef function ("YK-H", "H<->H" 
etc.). An optional argument labels can be given to define custom labels. 

In the above example the restrictions of the variances of the residuals 
of the endogenous variables are removed, and hence the two models mgl 
and mg2 share all parameters except for these variance parameters, and the 
intercepts which are identical in mg2. 

Next we simulate two datasets from model 1 (thus in fact only a single 
group): 

> datal <- sim(mgl, 200) [, manifest (mgl)] 

> data2 <- sim(mgl, 200) [, manifest (mgl)] 

To estimate parameters using MLE we simply type 

> (e.mg <- estimate (list (~ Arm 1~ = mgl, " Arm 2" = mg2) , list (datal , 
+ data2))) 



Group 1: Arm 1 (n=200) 

Estimate 

Measurements : 

Y2<-H 0.99827 
Y3<-H 1.04265 

Regressions : 



Std. Error Z value Pr(>|z|) 

0.06189 16.13044 <le-12 
0.06527 15.97469 <le-12 
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HOE 0.87361 0.06612 13.21214 <le-12 
Intercepts : 

H -0.09120 0.06334 -1.43978 0.1499 

Y2 -0.06491 0.09168 -0.70803 0.4789 

Y3 -0.06794 0.10117 -0.67155 0.5019 
Residual Variances: 

Yl 0.93968 0.13726 6.84580 

H 1.03203 0.12038 8.57301 

Y2 1.00505 0.14252 7.05186 

Y3 1.30951 0.17326 7.55823 



Group 2: Arm 2 (n=200) 

Estimate Std. Error Z value Pr(>|z|) 

Measurements : 

Y2<-H 0.99827 0.06189 16.13044 <le-12 

Y3<-H 1.04265 0.06527 15.97469 <le-12 
Regressions : 

H<-E 0.87361 0.06612 13.21214 <le-12 
Intercepts : 

H -0.09120 0.06334 -1.43978 0.1499 
Residual Variances: 

Yl 0.82723 0.12448 6.64570 

H 1.03203 0.12038 8.57301 

Y2 1.03718 0.14115 7.34796 

Y3 1.08498 0.15032 7.21790 



Comparisons of multigroup structures can be conducted using a LRT. As 
an example we fit the single group LLVM and perform a LRT to test whether 
the residual variances are the same in both groups and the intercepts are 
zero 

> e0 <- estimate (mg2, rbind (datal , data2)) 

> compare (eO , e.mg) 

Likelihood ratio test 

data: 

chisq = 2.3744, df = 5, p-value = 0.7953 
sample estimates: 

log likelihood (model 1) log likelihood (model 2) 
-2001.323 -2000.136 
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8. Data with missing values 



Missing data are common in studies with multivariate outcomes and 
complete case analyses can in these settings become quite inefficient and 
are further only consistent when data are missing completely at random 
(MCAR). 

Under the more general assumption that data are missing at random 
(MAR), i.e. tha t the missing data mec hanism depends only on the ob- 
served variables fjLittle and Rubinl . l2002l ). then inference can be based on 
the marginal likelihood, where the missing values has been integrated out 



f (y b S ; 0)= / f (yobs, y mi*; 0)dy n 



(67) 



Here f(y bs,ymis',@) is the full likelihood of both the observed data (yobs) 
and the missing part (y m is), parameterized by 9. 

In lava, MLE under the MAR assumption can be obtained by adding 
the missing=TRUE argument to estimate (both for lvm and multigroup 
objects) using the multigroup framework on the different missing patterns 
of the data. 

To demonstrate this, we will imitate a MCAR missing data mechanism 
on the first of the datasets simulated from mgl (see Section[7|), with a massive 
30% probability of missingness on each outcome 

> dO <- makemissing (datal , p = 0.3, cols = endogenous (mgl)) 

and the full-information maximum likelihood estimates can then be 
achieved with the call: 

> e.mis <- estimate (mgl , dO, missing = TRUE) 

> summary (e .mis , std = NULL, labels = FALSE) 
Latent variables : H 

Number of rows in data=199 (73 complete cases, 7 groups) 



Estimate Std. Error Z value Pr(>|z|) 



Measurements : 

YK-H 

Y2<-H 

Y3<-H 
Regressions : 

H<-E 



1.00000 
0.90265 
0.95524 

0.83840 



0.10861 8.31067 <le-12 

0.11812 8.08673 <le-12 

0.11018 7.60938 <le-12 
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Intercepts : 

Yl 0.00000 

H -0.14287 0.11042 -1.29388 0.1957 

Y2 0.05932 0.11132 0.53288 0.5941 

Y3 -0.01611 0.12958 -0.12433 0.9011 
Residual Variances: 

Yl 0.68217 0.17881 3.81499 

H 1.20761 0.21942 5.50375 

Y2 1.02999 0.19284 5.34120 

Y3 1.37374 0.23948 5.73625 



Estimator: gaussian 

Log-Likelihood = -730.9461 
BIC = 1525.811 
AIC = 1481.892 



with standard errors based on the observed information ( IKenward and Molenberghsl . 



1998). 



9. Beyond the standard linear Gaussian case 

While linear Gaussian models cover many useful situations there are 
clearly cases where these models are no longer adequate. In this section we 
will briefly describe extensions of lava that covers some of these cases. 

9.1. Clustered correlated data 

Models with very complex hierarchical structures can be estimated in 
lava. However, the full specification of such a model can be challenging 
and perhaps more importantly, as the lowest level in a such a model is 
often not of primary interest, it can be more natural to relax the model 
assumptions for this part of the model. 

As a hypothetical example we can imagine that the aim of a study is to 
estimate the association between noise levels and health. In practice this is 
done by measuring the average noise level, E, in different neighborhoods. 
We assume that the health status is measured indirectly for each subject 
by three proxy measures, Yu, Y 2 {, Y 3i (e.g. blood pressure and cholesterol 
levels), and that the with- in subject correlation between these measurements 
can be described by a single latent variable, Hi (see Figured]). The effect of 
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noise on health is quantified as the linear association between Hi and E. The 
study is complicated by the fact that measurements within neighborhoods 
are correlated beyond what Hi is capturing (air pollution, crime levels, 
traffic and other factors that could affect stress levels) and disregarding 
this with-in cluster correlation will generally lead to too optimistic standard 
errors of the noise effect. 

Inference can instead be based on the i.i.d. decomposi tion of the score 
leading to a sandwich estimator (GEE-type) of the variance (jWilliamsl . |2000| ) 

where S is the total score and <S( C ) is the sum of the scores within clus- 
ter c, and K denotes the tota l number of clusters. Simulation studies 
f jYan and Find . l2004t iPaild . Il988i ) indicate that the sandwich estimator works 
well with K > 50. 

With 5 individuals sampled from each cluster/neighborhood, the above 
model could specified with 

> K <- 5 

> mclustl <- lvm() 

> for (i in 1:K) { 

+ xyz <- c("Yl", "Y2", "Y3") %+% i 
+ h <- "H" %+% i 

+ regressionfrnclustl , to = c(xyz), from = h) <- list(l, "11", 
+ "12") 

+ regression (mclustl , to = h, from = c("U", "E")) <- list(l, 
+ "b") 

+ intercept (mclustl , c(xyz)) <- list ("mx" , "my", "mz") 
+ covariance (mclustl , c(xyz, h)) <- list("vx" , "vy" , "vz" , 
+ "v") 
+ } 

> latent (mclustl) <- c("H" %+% 1:K, "U") 

> intercept (mclustl , latent (mclustl)) <- 

We simulate data from 250 clusters and obtain the MLE 

> dclust <- sim(mclustl, 250, p = c(b = 0.3)) [, manifest (mclustl)] 

> eclust <- estimate (mclustl , dclust) 

and we wish to compare this with the marginal model (see Figure |8]): 
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> dclustWide <- reshape (dclust , direction = "long", varying = lapply(list("Yl" , 
+ "Y2", "Y3"), function(x) x %+% 1:K), v. names = c("Yl", "Y2" , 

+ "Y3")) 

> dclustWide <- dclustWide [order (dclustWide$id) , ] 

> mclust <- lvm(list(c(Yl, Y2, Y3) ~ H, H ~ E)) 

> latent (mclust) <- ~H 

> eclustO <- estimate (mclust , dclustWide) 




Figure 8: plot (mclust) : Marginal model for the noise-health example. 

The marginal estimates with robust standard errors are obtained easily 
by giving the name of the column in the data. frame that specifies the 
clusters (or an actual vector) as argument to the estimate function 

> eclustl <- estimate (mclust, data = dclustWide , cluster = "id") 

In this example we see a substantial under-estimation of the standard 
errors of the pollution effect estimate in the naive approach (with covariates 
varying within clusters the bias could go in the opposite direction as well). 
In contrast, the results of the marginal approach is close to those the full 
model. 

> res <- rbind(coef(eclust, 2) ["b" , ], coef (eclustl , 2) ["H<-E" , 
+ ], coef(eclustO, 2) ["H<-E" , ]) 

> rownames(res) <- c("full MLE" , "Marg . robust" , "Marg. naive") 

> res 

Estimate Std. Error Z value Pr(>|z|) 
full MLE 0.3683574 0.07816049 4.712835 2.442944e-06 
Marg. robust 0.3702012 0.07658741 4.833708 1.340131e-06 
Marg. naive 0.3702012 0.04843684 7.642967 2.131628e-14 
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Typically the loss of power in this marginal approach is modest, and is 
countered by circumventing the need for explicit (mis) specification of the 
distribution of the cluster random effect. 



9.2. Mixture models 

The normal distribution is a central assumption in ([3]) and (J4]), and one 
way to relax this assumption while still avoiding the need for computational 
intense numerical approximations of the likelihood function of the observed 
data, is to allow mixtures of normal distributions in the model. Applica- 
tions include pattern recognition and cluster analysis (machine learning), 
outlier analysis and modeling of heterogeneity, e.g. adjusting for unknown 
subpopulations in a sample. 

In general we will allow models to be described by the convex combina- 
tion 

K K 

fe(y\z) = J2^fe j ) (y\z), £>i = 1, ^]0,1[ (69) 

3=1 3=1 

where fj^ . is a probability density of a LLVM with responses y and covari- 
ates z = (x',v',w'y, and 9 is the parameter-vector (Uj9j, 7Ti, . . . , 7Tk-i), 
noting that the fy's need not to be disjoint. We denote the number classes 
K. 

The likelihood for the mixture model is therefore 

n K 

L(%,z)=nE^ ( ?^i 2 *) ( ? °) 

i=l j=l 

To solve the corresponding sco re equation, the EM algorithm is typically 
applied (IDempster et all . ll9771 ). We introduce latent indicator variables 
i — 1, . . . , n, describing the class membership of the observation (j/j, Zi). 

~^-{Uij belongs to class j} 

(71) 

and hence E£y = F(yij belongs to class j) = iTj. We can then treat the 
mixture analysis as a missing data problem, v = (y,z,£), and complete- 
data log-likelihood: 

n K 

\ogL c {e\v) = ^r$>,log (jf (^ I Zi)) (72) 

i=l j=l 
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While the EM-algorithm is generally slower than Newton-Raphson (sub- 
linear vs. quadratic convergence), this disadvantage is compensated by the 
fact that the EM-algorithm tends to be less sensitive to choice of starting 
values as the algorithm guarantees a non- decreasing likelihood in each iter- 
ation. In contrast, NR can behave erratically for poor choices of starting 
values. To address possible problems with convergence to a local maximum, 
it is still advisable to start the algorithm at several different starting points 
in the parameter space. The EM algorithm also implicitly constraints the 
probability vector to the correct parameter space, as the E-step at iteration 
/ leads to a simple expression of the posterior class probabilities 

ff<<) = ^f" (Wi *> (73) 
In the M-step we obtain the new class probabilities 

Sf" = i£*# (74) 

1=1 

and is derived by solving 

n K 

arg max, Q(6; = ^ ]T 5$ log {f^Vi | *,)) (75) 

i=i j=i 

which for a general LLVM mixture is optimized iteratively (NR pr. default). 
In principle a model relating class probabilities to covariates could also be 
included in this setup leading to a M-step where we instead of the simple 
expression (173|) would have to maximize a weighted multinomial logit model. 

To analyze a mixture model in lava the plugin package lava. mixture 
needs to be loaded. The function mixture fits the mixture model to a list 
of lvm objects implicitly defining the number of mixture components of the 
model. Via the control argument the parameters of the EM algorithm can 
be adjusted, such as the starting values (start), number of random starting 
points (nstart), convergence tolerance (tol, change in log-likelihood), etc. 

> library (lava. mixture) 

> mixture (list (ml,m2, . . .) , data=mydata, control, . . .) 

Instead of a list, a single lvm object can be given as argument with the 

argument k specifying the number of mixture components. 
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As an example we will simulate data from a simple model (see Figure 
[9]), where we have a single dichotomous unmeasured confounder z {P{z = 
1) = 0.5), which have a direct linear effect on the outcome of interest Y 

Y = i2 Y + f3X + lY Z + e Y (76) 

and on the predictor X, which we assume is conditionally normally dis- 
tributed given Z 

X = fi X + IxZ + e x (77) 

In our simulation we will let all intercepts be zero, residual variances 1, and 
slopes as defined by Figure [9j 

> mixO <- lvm(list(Y ~ X + Z, X ~ Z)) 

> distribution(mixO, ~Z) <- binomial . lvm() 

> dO <- simfrnixO, p = c(~Y<-Z~ =2), n = 500) 




z 



Figure 9: Model mixO with unmeasured confounder z. 

Next, we will define the mixture regression that takes into account that we 
have unobserved heterogeneity caused by Z. The base model is the simple 
linear regression model of Y given X, however, with a covariance call we 
define X to be endogenous and let all parameters except for the intercepts 
of Y and X be fixed between the classes 

> mixl <- lvm(Y ~ X) 

> covariance (mixl , ~X) <- "v" 

> mixl <- baptize (mixl) 

> intercept (mixl, ~Y + X) <- NA 
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defining the model 

Y = fjLy, c + f3X + e, (78) 

with the index c denoting the class. The model with two classes is fitted 
with the call 

> M <- mixture (mixl, dO, k = 2) 

> (s <- summary (M) ) 

Cluster 1 (n=231, Prior=0.4687) : 

Estimate Std. Error Z value Pr(>|z|) 

Regressions : 

Y<-X 0.95114 0.07881 12.06853 <le-12 

Intercepts : 

Y 2.11558 0.15240 13.88137 <le-12 
X 1.11484 0.11280 9.88306 <le-12 

Residual Variances: 

Y 1.14553 0.12566 9.11583 
X 1.05875 0.11837 8.94435 

Cluster 2 (n=269, Prior=0 . 5313) : 

Estimate Std. Error Z value Pr(>|z|) 

Regressions : 

Y<-X 0.95114 0.07881 12.06853 <le-12 

Intercepts : 

Y 0.15417 0.11353 1.35802 0.1745 
X -0.10255 0.10257 -0.99982 0.3174 

Residual Variances: 

Y 1.14553 0.12566 9.11583 
X 1.05875 0.11837 8.94435 

AIC= 3325.163 

I | score | | "2= 0.0001193449 

We see that the mixture model gives a regression coefficient of 0.951 which 
is close to the true value. This should be compared to the biased OLS 
estimate of f3: 
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> coef (summary (lm(Y ~ X, dO))) 



Estimate Std. Error t value Pr(>|t|) 
(Intercept) 0.8785888 0.06556336 13.40061 3.472204e-35 
X 1.3675927 0.05108919 26.76873 1.804800e-98 

In fact for the given set of parameters the bias is 

h . ( o ,_a (/37i + 7y7x)Var(Z) + /3Var(6x) _ 2 , , 
bias(/3 OL s) -P 7 ^ V ar(Z) + Var(e x ) " * (80) 

and replicating the above simulation 1000 times we obtain the following 
statistics for the /3 parameter 



Estimator 


Bias 


Std.Err 


MSE 


OLS 


0.403 


0.0510 


0.165 


Mixture 


0.002 


0.0822 


0.068 



which clearly shows the advantage of the mixture regression model. 

The example generalizes to several binary confounders (2 k classes with 
k confounders, and the continuous case could likewise be approximated by 
a finite number of mixtures) and if we were suspecting an interaction effect 
between X and Z we could also have allowed the slope parameter (j3) to 
vary freely in the two classes. The difficult task of choosing an optimal 
number of components in the mixture could be based on cross-validation, 
but currently this is not implemented in lava . mixture (nor is the problem 
of adjusting the standard errors for this model selection which could be 
based on a bootstrap resampling). Still we believe that a mixture analysis 
as in the previous example could serve as an important sensitivity analysis 
in many applications. 

Two variants of the EM-algorithm are also implemented in the mixture 
function (via the argument type): CEM (Classification EM) where each 
observation in the E-step is assigned to the class with the highest max- 
imum posterior probability (1731) . and StEM (Stochastic EM) where each 
observation is assigned randomly to a class based on a draw from a multi- 
nomial distribution with the posterior probabilities as parameter. In both 
cases, the M-step reduces to the maximization of a simple multigroup LLVM 
(see Section Ej). The latter approach leads to a time- homogeneous Markov 
chain of the parameters, which under certain regularity conditions is ergodic 
with a normal stationary distributio n with a mean that is a co nsistent es- 



timate of the mixture parameters (ICeleux and Diebolt I . Il986l ). In some 
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situations, the StEM has been observed to have faster convergence and be- 
ing more likely to converge to the global maximum of the log-likelihood as 
the stochastic variation introduce d can push the "EM"- al gorithm away from 



a saddle-point or local maximum flDiebolt and Ip I . Il996l ). However, assess- 



ment of convergence of the Markov chain remains a challenging problem. 
In any case, the CEM or StEM algori thm might be useful f or finding good 
starting values for the EM algorithm (IBiernacki et all . 120051 ). 

In models with an unstructured mean and covariance in each class, the 
function mvnmix can be used because it exploits the fact, that the M-step 
has a closed-form solution (in fact the likelihood is unbounded, however, this 
is of more technical than practical interest as we typically can disregard the 
obviously wrong solutions to the score equations). As an example we will 
fit a two-component Gaussian mixture model to the waiting times between 
eruptions and the durations of the eruptions for the Old Faithful geyser in 
Yellowstone National Park: 

> data (faithful) 

> mixff <- mvnmix(faithful, k = 2) 

> (s <- summary (mixff ) ) 



Cluster 1 (n=97, Prior=0 . 3559) : 



Estimate 

Intercepts : 

eruptions 2.03639 
waiting 54.47852 

Residual Variances: 

eruptions 0.06917 
eruptions , waiting 0.43517 
waiting 33.69728 



Std. Error Z value Pr(>|z|) 

0.03495 58.27283 <le-12 

0.62846 86.68510 <le-12 

0.01074 6.43961 

0.17984 2.41979 0.01553 
5.77754 5.83246 



Cluster 2 (n=175, Prior=0 . 6441) : 



Intercepts : 

eruptions 

waiting 
Residual Variances: 

eruptions 

eruptions .waiting 

waiting 



Estimate Std. Error 

4.28966 0.03349 
79.96812 0.47131 

0.16997 0.02112 
0.94061 0.18679 
36.04621 4.09090 



Z value Pr(>|z| ) 

128.06900 <le-12 
169.67075 <le-12 

8.04628 

5.03562 4.763e-07 
8.81132 
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AIC= 2282.528 

I Iscorel | "2= 8.611289e-16 




Figure 10: plot (mixf f , col="darkblue" , "orange") : 2-component Gaussian Mixture 
Model fit of Old Faithful data (95% prediction regions for each mixture distribution 
and each observation where assigned the color of the mixture with the highest posterior 
probability) 



9.3. Binary data 

In epidemiology binary data are among the most common types of end- 
points and often correlated binary data are collected. Several methods 
hav e been proposed t o dea l with this sort of data, e.g. marginal mod- 



els (ILiang and Zegerl . Il986l ). conditional maximum likelihood estimation 



(lAndersenl . Il97ll ) or numeri cal integration to obtain the marginal likeli 



hood of the observed da ta (IPinheiro and Chad . 120061 ). Via the package 
lava.tobit (jHolstl . 120111 ) we can estimate LLVMs where a subset of the 
endogenous variables are binary using a probit-link (only subtle differences 
with a logit-link resulting in a scaling of roughly 1.7 of the parameters). 

As an example we will look at a simple factor analysis model with a 
covariate 



p(Ky = i\m,x i ) = $( f ji j + \jTi i 

r]i = >yXi + d, 



51) 
52) 



where Q ~ A/"(0, a 2 ), and $ is the standard normal cumulative distribution 
function. 

There are several ways to specify this model in lava, but here we will 
use the binary function: 
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> mprobit <- lvm(list(c(yl , y2, y3) ~ eta, eta x)) 

> latent (mprobit) <- "eta 

> binary (mprobit) <- endogenous (mprobit) 

> set . seed(l) 

> dprobit <- sim(mprobit , 500) 

With the binary call the endogenous variables, Yy, are changed from being 
continuous to being dichotomous, where we assume that there exists a latent 
conditionally normally distributed variable, Y*-, such that 



Y 



The MLE is obtained as usual 



Y$>0 



(83) 



> lava. options (param = "hybrid", trace = 1, method = "NR") 

> (eprobit <- estimate (mprobit, dprobit) ) 



Estimate Std. Error Z value Pr(>|z|) 



Measurements : 

y2<-eta 

y3<-eta 
Regressions : 

eta<-x 
Intercepts : 

yi 

y2 
ys 

Residual Variances: 
eta 



0.78017 
0.92687 

1.09228 

0.12358 
-0.06024 
0.04414 

1.32052 



0.15378 5.07336 3.909e-07 
0.18950 4.89114 1.003e-06 

0.15999 6.82737 8.649e-12 



0.09374 1.31829 
0.08104 -0.74330 
0.08930 0.49432 

0.40634 3.24979 



0.1874 
0.4573 
0.6211 



Another interesting example is the logit-probit-normal model (ICaffo and Griswoldl . 



20061 ) which is a conditional model with marginal fixed effects on a logit 
scale. E.g. a random intercept model with a single covariate Xy can be 
formulated as 



P(r u = l|r /j ,X u ) = $U 



,-1 



1 



1 + exp(— /i — /3Xi 



a- 



where a 2 is the variance of the random effect rji. The nonlinear parameter- 
ization ensures the condition 



logit P(^ = 1 | X ij )=fi + f3X, 
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J5) 



Specification is straightforward using the constrain method, and stan- 
dard likelihood theory can be applied on this model in contrast to the gen- 
eralized estimating equation framework, e.g. likelihood ratio testing, profile 
likelihood confidence limits, and analysis with data missing at random as 
described in Section [71 

As an example we simulate observations from a simple random intercept 
probit model 

> margin <- lvm(c(yl , y2) ~ x) 

> regression(margm, c(yl, y2) ~ u) <- "gamma" 

> intercept (margm, endogenous (margm) ) <- "mu" 

> binary (margm) <- endogenous (margm) 

> covariance (margm, ~u) <- 1 

> latent (margm) <- ~u 

> dmarg <- sim(margm, 100) 

The logit-probit-normal model is then specified as 

> regression(margm, c(yl, y2) x) <- 

> constrain (margm, mu ~ x + alpha + beta + gamma) <- function(z) qnorm(l/(l + 
+ exp(-z[2] - z[3] * z[l]))) * sqrt(l + z[4]~2) 

and estimates are obtained in the usual way 

> (emargm <- estimate (margm, dmarg)) 



9.4- Censored data 

Censoring is a complication that is often encountered in cohort studies 
but can also be seen in experimental studies where thresholding of measure- 
ments may occur due to limiting precision of laboratory equipment. 

We will assume that the data-generating mechanism is defined by ([3]) and 
(j4]) and that a subset of the endogenous variables Y*j, j G J are censored 
such that for given censoring times Cj, j G J we only observe 



Estimate Std. Error Z value Pr(>|z|) 



Measurements : 

yl<-u 
Intercepts : 



2.08070 



0.57583 3.61339 0.0003022 



alpha 
beta 



0.04207 
1.35056 



0.20986 0.20049 0.8411 
0.25772 5.24039 1 . 602e-07 




(86) 
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As an example we will simulate data from a regression model (see Figure 
[9]), where we have a single dichotomous mediator Z (P(Z = 1) = 0.5), which 
have a direct linear effect on the outcome of interest Y 

Y = hy + 0X + lY Z + e Y (87) 

and on the predictor X, which we assume is conditionally normally dis- 
tributed given Z 

X = fi x + lx Z + e x (88) 

In our simulation we will let all intercepts be zero, residual variances 1, and 
slopes as defined by Figure [9j 

> medO <- lvm(list(Y ~ X + Z, X ~ Z)) 

> distribution(medO, ~Z) <- binomial . lvm() 

> dO <- simhnedO, p = c(~Y<-Z~ = 2), n = 500) 

We further assume we only observe the thresholded versions of Y and 

X: 

> dtobit <- transform(dO, X = as . factor (Of > 0) * 1) , Y = Surv (pmin (Y , 
+ 2), Y < 2)) 

As X is coded as a factor and Y is coded as a right-censored Surv-object 
(combinations of left and right censoring are allowed) the estimate method 
automatically applies a Probit and Tobit model respectively 

> (etobit <- estimate (medO, dtobit)) 





Estimate 


Std. 


Error 


Z 


value Pr (> | z | ) 


Regressions : 
















Y<-X 


0. 


. 94429 


0, 


.07790 


12. 


.12215 


<le-12 


Y<-Z 


2. 


.04975 


0, 


.13527 


15. 


. 15247 


<le-12 


X<-Z 


1, 


,00076 


0, 


,12154 


8, 


,23419 


<le-12 


Intercepts : 
















Y 


-0. 


,02887 


0, 


, 07733 


-0. 


.37334 


0.7089 


X 


-0. 


,07362 


0, 


,07994 


-0. 


.92096 


0.3571 


Residual Variances: 
















Y 


0, 


,97204 


0, 


,10225 


9, 


,50687 





The Probit/Tobit model framework also has important applications in 
the causal modeling framework where it allows us to elegantly define direct 
and indirect effects for binary and censored data in complex path analyses 
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(IDitlevsen et all . 120051 ). The interpretation in this setup is linked to the as- 
sumption that the observations are generated by an unobserved continuous 
variable following a conditional normal distribution, and the indirect effects 
can therefore be quantified via the effects method: 

> effects (etobit, Y ~ Z) 

Total effect of 'Z' on 'Y' : 

2.994759 (Approx. Std.Err = 0.1509939) 
Direct effect of 'Z' on 'Y' : 

2.04975 (Approx. Std.Err = 0.135275) 
Indirect effects: 

Effect of 'Z' via Z->X->Y: 

0.9450087 (Approx. Std.Err = 0.1358391) 

9.4-1- Inverse probability weights 

For models with complex sampling (survey data) and as an alternative 
method to deal with censored or missing data it is con venient to introduce 



Inverse Probability Weights in the e stimation procedure flHorvitz and Thompson . 



19521 ; iRotnitzky and Robinsl . Il995f ). The estimating equations in this situ- 



ation becomes 



00' i \ — V s1 " 1 

- %\Y, - (1< - £„,i)'fVJ Wj J (89) 
+ 2p=g^)Vw,«-W}=0 

where Wj is the weight-matrix for the ith observation, and and Qg 
are the model-specific mean and covariance matrix of Yi given covariates 
Z % = (X>, V/, W{)'. 

In lava, equation fl89|) can be solved with a diagonal weight matrix 
using the estimator weighted and using the argument weight with the 
estimate method. The weight argument should be either a matrix with 
the weights of the endogenous variables of the model (or a named matrix 
with a subset of the variables) or alternatively a character vector with the 
names in the data. frame that corresponds to weig htffl (the weights are then 



x For the lava.tobit package the weight argument is already reserved and the 
weight 2 argument should be used instead and further the estimator should not be 
changed from the default. 
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assigned to the variables in the models in the order they appear, see e.g 
the vars function). For multigroup models a list of matrices or character- 
vectors is expected. 



9.5. Instrumental variables 

In econometrics Instrumental Variable (IV) estimators are popular tools 
for dealing with the problem of covariates that are correlated with the resid- 
ual term of the response variable. In this situation ordinary linear regression 
analysis will yield biased estimates. The idea is to identify an IV, which is 
a variable fulfilling the conditions 

1. Correlated with the problematic covariate, X, given all other covari- 
ates 

2. Uncorrelated with the residual error, e, of the response, Y. 

The estimator can be then be formulated as Two-Stage Least-Squares (2SLS) 
approach. In the first step, regress X on the IV(s) and obtain the predicted 
covariate, X. In the second stage, regress Y on X (and other coyariates) 



Cons istency follows under very weak assumptions flGreend . 120021 ; lAngrist 



20011 ) 



The method can also be applied to estimate parameters in SEMs (IBollen 



19961 ). In the following we will assume that the model of interest is a SEM 
(no random slopes, single group) without linear or non-linear parameter 
constraints 

Y = v + Ary + KX + e, (90) 



r) = a + Br] + rX + C, (91) 

with a zero diagonal of B. We further assume that we have at least one 
indicator for each latent variable. This means that there exists a subset 
of endogenous variables Y of Y, with no other predictors than the latent 
variable. From these we can identify the latent variables 

rj = Y - e. (92) 

Substituting f )92|) into (190]) and fl9Tj) we obtain the following equation for 
one of the endogenous variables, Yj (not part of Y): 

Y 3 = v d + AjY + K 3 X + ej - Aje, (93) 



and similarly for the surrogate 



Y r = a r + B r Y + T r X + ( r +Z r - B r e. (94) 



Equations (153"]) and (|94|) only includes observed predictors but parameters 
cannot be estimated using OLS because of obvious correlations between 
predictors and residuals. Instruments have to be identified that are uncor- 
rected with the residual terms (u r or Uj) in f l93]) - fl9~4"|) . while at the same 
time being correlated with the part of Y that are entering the equation 
as predictors. In lava these conditions are checked automatically via the 
model-implied covariance structure, selecting the largest possible set of in- 
strument variables for each equation. To implicitly select the instruments 
for the different regression equations, covariance between specific residual 
terms should be added to the model structure via the covariance method 
(thereby indirectly disqualifying a varia ble as an inst rument candidate). As 



all equations are solved simultaneously ( IBollenl . 120011 ) the covariance matrix 
of the estimates are available (i.e. vcov) and Wald tests via the compare 
method are possible. Consistent estimates of the variance parameters, are 
obtained via MLE in the model with all other parameters fixed at the IV 
estimates (can be disable with the control parameter variance=FALSE). 

To illustrate the method we simulate data from a complex latent model 
with three measurement models (see Figure fTTl) : 



> mlV <- lvm() 

> regression(mlV) <- c(yl, y2, y3) ~ etal 

> regression(mlV) <- c(vl, v2, v3) ~ eta.2 

> regression(mlV) <- c(zl, z2, z3) ~ eta.3 

> latent (mlV) <- "etal + eta2 + eta3 

> regression (mlV) <- etal ~ eta2 + eta3 + x2 

> regression(mlV) <- eta2 ~ eta3 + xl 

> regression(mlV) <- eta3 ~ xl 

> covariance (m I V) <- yl ~ vl 

> regression(mlV) <- y2 ~ x2 

> regression (mlV, c(yl[0], vl [0] , zl[0]) ~ etal + eta2 + eta3) <- c(l, 
+ 1, 1) 

> dIV <- simbnIV, 1000) [, manifest (mlV)] 



In this model we explicitly defined yl, vl and zl as the indicators in (|92|) 
setting the factor loadings to one and intercepts to zero. To apply the 
instrumental variable estimator using the model-implied instruments, we 
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Figure 11: plot(mlV) 



simply add the argument estimator="IV" to the estimate function, and 
as we have already chosen the indicator-set we set f ix=FALSE: 

> summary (elV <- estimate (mlV, dIV, estimator = "IV", fix = FALSE)) 

Latent variables: etal eta2 eta3 
Number of rows in data=1000 



Estimate Std. Error Z value Pr(>|z|) std.xy 

Measurements : 



Reg: 



yl<-etal 


1. 


,00000 














0. 


.96794 


y2<-etal 


0. 


,99358 


0, 


,01199 


82. 


.89643 


<le- 


12 





.89359 


y3<-etal 


0, 


,98599 


0, 


,01175 


83. 


.90153 


<le- 


12 


0. 


,96603 


vl<-eta2 


1. 


,00000 














0. 


,91585 


v2<-eta2 


0, 


,99870 


0, 


.01972 


50. 


.64819 


<le- 


12 


0. 


,92661 


v3<-eta2 


1. 


,02062 


0, 


.02023 


50. 


.45033 


<le- 


12 


0. 


,92472 


zl<-eta3 


1, 


,00000 














0. 


,81644 


z2<-eta3 


0, 


, 94467 


0, 


,03249 


29. 


.07141 


<le- 


12 


0. 


,80369 


z3<-eta3 


0. 


.98653 


0, 


.03382 


29. 


.17123 


<le- 


12 


0. 


.81647 


;ressions : 






















y2<-x2 


0, 


,98771 


0, 


,04606 


21. 


.44170 


<le- 


12 


0. 


,22739 


etal<-eta2 


1, 


.05111 


0, 


,06144 


17. 


. 10780 


<le- 


12 


0. 


, 64088 


etal<-eta3 


0, 


,89998 


0, 


,10608 


8, 


.48410 


<le- 


12 


0. 


,32826 


etal<-x2 


0, 


,97850 


0, 


.05490 


17. 


.82364 


<le- 


12 





,25048 


eta2<-eta3 


0, 


.90539 


0, 


.06139 


14. 


.74821 


<le- 


12 





,54161 


eta2<-xl 


1, 


,05768 


0, 


,07972 


13. 


.26707 


<le- 


12 


0. 


,44770 


eta3<-xl 


0, 


,97241 


0, 


, 04543 


21. 


.40496 


<le- 


12 


0. 


,68806 
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Intercepts : 



Res 



yi 


. 


. 00000 












, 


. 00000 


y2 


-0. 


.03743 


0. 


. 04464 


-0. 


.83837 


0.4018 


-0. 


,00861 


y3 


-0 , 


, 07527 


. 


. 04534 


-1 . 


.66018 


0.09688 


-0 . 


, 01886 


etal 


, 


, 01279 


, 


. 05512 


. 


. 23202 


0.8165 


, 


, 00327 


vl 


. 


. 00000 












, 


. 00000 


v2 


-0 , 


. 03401 





. 04561 


-0, 


.74560 


0.4559 


-0 , 


. 01323 


v3 


-0. 


.02803 





,04679 


-0, 


.59910 


0.5491 


-0. 


.01065 


eta2 


. 


, 09849 


0. 


,05343 


1 . 


.84351 


0.06526 


, 


. 04131 


zl 


. 


, 00000 















. 00000 


z2 


-0 


.00774 


0. 


.04259 


-0. 


. 18173 


0.8558 


-0. 


. 00462 


z3 


0. 


. 02848 


0. 


, 04488 


0. 


.63450 


0.5258 


0. 


.01652 


eta3 


0. 


,05345 


0. 


.04585 


1 . 


. 16578 


0.2437 


0. 


.03747 


;idual Variances: 




















yi 


1 , 


. 02973 















. 06310 


yl.vl 


0, 


.53963 












0. 


,05131 


y2 


0. 


.98617 












0. 


,05217 


ys 


1. 


.06375 












0, 


.06678 


etal 


1, 


.01587 












0, 


.06644 


vl 


1. 


.09261 












0. 


,16123 


v2 


0. 


.93365 












0. 


,14140 


v3 


1. 


.00333 












0. 


. 14490 


eta2 




.98078 












0. 


. 17254 


zl 


1. 


.01747 












0. 


.33342 


z2 


0. 


.99507 












0. 


.35408 


z3 


0. 


.99004 












0. 


.33338 


eta3 


1, 


,07110 












0, 


.52657 



Estimator: IV 



Latent variables etal , eta2 , eta3 
Surrogate variables: yl,vl,zl 

Response Instruments 

y2 y3,v2,v3,zl,z2,z3,x2,xl 

y3 y2,v2,v3,zl,z2,z3,x2,xl 

etal v2,v3,z2,z3,x2,xl 

v2 y2,y3,v3,zl,z2,z3,x2,xl 

v3 y2,y3,v2,zl,z2,z3,x2,xl 

eta2 z2,z3,x2,xl 
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z2 
z3 

eta3 



yl , y2 , y3 , vl , v2 , v3 , z3 , x2 , xl 
yl , y2 , y3 , vl , v2 , v3 , z2 , x2 , xl 
xl,x2 



The IV estimator has the advantage of being a non-iterative procedure 
and requires weaker assumptions than MLE. There are also indications that 
IV estimators are performing well in low sample-sizes an d are generally mor e 



robust to structural model mis-specification than MLE (jBollen et all . 120071 ) . 
Where applicable an IV analysis is therefore a good choice of method for 
analyzing the model fit of a structural equation model fitted with MLE (e.g. 
based on a Hausman type test statistics). 

10. Graphics 



Function 



Task 



plot Plots path diagram of model 

labels Defines labels for variables 

edgelabels Defines labels and style for edges of the graph 

nodecolor Defines color and style of nodes/variables 

Graph Extracts graph (graphNEL object) 



Table 5: Graphics functions. 



A plot method is available for both the lvm, multigroup and lvmf it 
classes. Plotting a lvmf it object shows the user whether lava has linked the 
model to the data as intended. A common mistake is that a variable name 
in the model specification does not appear in the data. In this case lava 
consider the corresponding variable to be latent which is easily identified 
from the plot. 



Layout and rendering of the graphs are achieve d via Rgraphviz (IGentrv et al 



2009), which in combination with the tikzDevice ( iSharpsteen and Bracked . 
2010l ) makes it possible to produce publication quality path diagrams. 

In the following we will use model ml defined in Section 13.11 as the ex- 
ample. To enhance the graph we will add some labels to the nodes using 
the function labels H 



2 for more information on mathematical annotation in R we refer to the plotmath 
help- page. 
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> labels(ml) <- c(ul = expression (eta [1] ) , u2 = expression(eta[2] )) 

Similarly, subscripted versions of the observed variables could be defined 
but we will keep them as they are for now. The labels of an object can be 
examined with labels (m). 

Labels of edges can be defined with the edgelabels-function, e.g. to 
define new labels, colors and line width for the edges from x\ to U\ and from 
u i to 2/11 we call 

> ml <- mO 

> edgelabels (ml , ul~xl, lwd=3, col="blue") <- expression (beta [1] ) 

> edgelabels (ml , yll~ul, lwd=3, col="red" , 

+ labcol="red") <- expression(beta[2] ) 

In addition the argument labels=TRUE can be parsed to the plot method 
to add parameter names (named pl,p2,... if no names were previously given 
using, e.g., regression) to the edges of the plot. This will override (but 
not delete) previously defined edge label attributes. 

The color of the nodes is automatically added to the graph. To disable 
this functionality the argument addcolor=FALSE should be parsed to the 
plot-method. New coloring and style can be added via nodecolor: 

nodecolor (ml) <- "indianred" 
nodecolor (ml , ~yll+yl2+yl3 , 

+ labcol="white" , lwd=c(3,l,l)) <- "lightblue" 

nodecolor (ml , ~xl+x2, labcol="red" , 

+ border=c ( "black" , "white")) <- "white" 

10.1. Graph Attributes 

Specific attributes of the graph such as font size, line width etc. can 
also be controlled via the nodeRenderlnf o and edgeRenderlnf o functions 
called on the graph object in a lvm-object accessible via the Graph function 
(also available for lvmf it-objects). E.g. to set the font size to 2 of all 
edge-labels (see Figure H2J) we would write 

> edgeRenderlnf o (Graph (ml) )$cex <- 1.5 

Methods for visualizing lvmf it are also available. As an example we will 
extract the pathways from xl to z3 with the path method and highlight 
and label the corresponding arrows in the path-diagram (see Figure [T3|) : 
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Figure 12: plot (ml) 



Graph (e) <- Graph (e, add=TRUE) 

labels (Graph (e) ) <- c(ul=expression(eta[l] ) , u2=expression(eta[2] )) 
mypath <- path(e, z3~xl) 

edgeRenderInfo(Graph(e))$lwd[unlist(mypath$edges)] <- 2 
edgeRenderlnfo (Graph (e))$label <- NA 

edgelabels(e, edges=unique (unlist (mypath$edges) ) , cex=l.l) <- 
+ c("beta[l]", "gamma[l] " , " gamma [2] " , "beta[2]") 

10.2. Graph Layout 

Several automatic graph layout algorithms are available through the 
layout Type argument (see Figure [H]). Additionally, the graph (obtain- 
able via the Graph method) can be saved with the doDot function and be 
processed in an external program supporting the dot-format (graphviz). 
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11. Application: Brain serotonin transporter imaging data 



We consider 54 observations from ( Kalbitzer et all . l2010l ) with measure- 



ments of the serotonin transporter (SERT) in the human brain as measured 
by Positron Emission Tomography (PET) techniques. The outcome of inter- 
est, SERT, was quantified as binding potential of the specific tracer binding 
(BPjvd) in four regions of interest, which a priori were identified as high- 
binding and reliable measurements of SERT. The serotonergic system has 
been suggested to have a strong impact on mood and as a potential pre- 
dictor of the development of seasonal affective disorder (SAD). The aim of 
the original study was to explore the association between levels of SERT 
and the interaction between seasonality and a repeat polymorphism in the 
promoter region of the serotonin transporter gene (5-HTTLPR). This was 
achieved by linear regression on each regional outcome. However, these data 
are characterized by high inter-regional correlation, and in the following we 
will supplement the original analysis with a multivariate analysis taking this 
aspect into account. It has been demonstrated that SERT levels respond 
to chronic changes in brain serotonin (5-HT) levels as suggested by studies 
of the effect of SSRI treatment and in animal studies. It has therefore been 
suggested that the co mmon regulator of SERT is this underlying 5-HT level 



(lErritzoe et all . 120101 ) which, however, cannot be measured directly in vivo. 
This biological model can be captured by a structural equation model, with 
a simple measurement model describing the four regions of interest, and 
a structural model in which exogenous variables affects the SERT mea- 
surements indirectly through the intermediate latent variable as shown on 
Figure [15j 

We chose to model the seasonal effect using a harmonic function with 
a period of one year, described by the amplitude, A, and the translation 
parameter, 5 (time of peak): 

.'2n(t-5)\ A f2n5\ f2nt\ A f2n5\ ( 2ixt 
A cos — — — — = A cos — — cos — — + Asm — — sm 



365 J V 365 / V 365 / V 365 / V 365 

V v ' V «, ' 

(95) 

where t is the date of the scan. This parameterization approximately embeds 
a simpler model, where the seasonal effect is described as a linear function 
of the amount of daylight minutes on the date of the PET scan. However, 
the harmonic curve adds the flexibility of letting the time of peak be a free 
parameter which allows a possible delayed seasonal effects on the serotonin 
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Figure 15: Brain regions of interests and initial SERT model. 

transporter to be taken into account. In lava the seasonal effect could 
be modeled directly (via constrain) as the left-hand-side of fl95|) . thus 
directly quantifying A and 5. Here we use the linear parameterization on 
the right-hand-side using the cosine and sine transformed time variables as 
predictors. As in the original study, we also adjust for possible main effects 
of age, gender and the 5-HTTLPR polymorphism (dichotomized as carriers 
of the (short) s- variation vs. non-carriers (long-long alleles)) 

> httmod <- lvm(c(cau,th,put ,mid) ~ eta) 

> regression(httmod) <- eta ~ f(cos,bl)+f(sin,b2)+age+sex+gene 

As described in Section [61 we assessed the model fit via residual plots, 
score tests and the global x 2 -test, and concluded that the local independence 
between the midbrain and thalamus region was not plausible. Significantly 
different age and gender effects on caudate nucleus were also identified in 
this process and therefore added to the model. 

> regression(httmod) <- cau~sex+age 

> covariance (httmod) <- th~mid 

No additional evidence against the model was identified, thus leading to a 
final model as illustrated in Figure [T6j 

> tikz(f ile="sertseml . tex" ,8, 8, standAlone=TRUE) 

> m <- lvm(~cau+th+put+mid) 

> regression(m) <- c(cau,mid, th,put) "eta 

> latent (m) <- "eta; labels (m) <- c(eta="$\\eta$") 

> regression (m) <- eta~sex+age+date 
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> nodecolor(m,vars (m) ) <- 

> nodecolor (m, endogenous (m) ,labcol="white") <- 
+ c("red" , "blue" , "green" , "orange") 

> plot(m) 

> dev. off () 




Figure 16: Final SERT-seasonality model. 

Parameter estimates are obtained by maximum likelihood using a Newton- 
Raphson algorithm: 

> lava. options (method="NR" , trace=l , tol=le-12 ,param= "relative") 

> httmod.fit <- estimate (httmod,data=dasb) 

One of the aims of the analysis is to quantify the seasonal effect, therefore 
we will also estimate the translation and amplitude (unsigned) 

5 = arctan(/V/3i)365/(27r) (96) 
\A\ = \fW+¥l (97) 

> constrain (httmod . fit ,delta~bl+b2) <- function(x) atan(x[2] /x[l] ) *365/ (2*pi) 

> constrain(httmod.fit,A~bl+b2) <- function(x) sqrt (x[l] ~2+x[2] ~2) 

> summary (httmod . f it , std=NULL) 
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I I score I | ~2= 1.171417e-ll 
Latent variables: eta 
Number of rows in data= 54 







Estimate 


Std. Error 




Z value 




Pr(>|z|) 


Measurements : 
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.0000e+00 














th<-eta 
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.6834e-01 
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Q 
O 
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.4499e-04 
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f 
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Regressions : 


















cau<-age 
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■i 
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cau<-sex 
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( 




-4 


.4416e+00 
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b2 : eta<-sin 
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Residual Variances 
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.8548e-03 


2 


.0635e+00 






Non-linear constraints 
















Estimate 


Std 


. Error 




Z value 


Pr(>|z|) 




2 . 5% 97 . 5% 


delta -19.6215810 


19.5444140 - 


•l.i 


3039483 


0.3154035 -57.! 


9279285 18.6848 


A 0.1136378 


0.0426363 


2.i 


5652834 


0.0076923 


0.1 


D300722 0.1972 



Estimator: gaussian 



Number of observations = 54 
Log-Likelihood = -57.88782 
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BIC = 223.2812 
AIC = 155.7756 

log-Likelihood of model = -57.88782 
log-Likelihood of saturated model = -46.94523 

Chi-squared statistic: Q = 21.88517 , df = 14 , P(Q>q) = 0.08100473 



As an alternative to the Wald test we can conduct a LRT to test the 
significance of the seasonal parameters: 

> httmodO <- httmod; kill (httmodO) <- ~cos+sin 

> httmodO . fit <- estimate (httmodO, dd, control=list (start=coef (httmod. fit))) 

> compare (httmodO . fit , httmod. fit) 

Likelihood ratio test 

data: 

chisq = 12.3978, df = 2, p-value = 0.002032 
sample estimates: 

log likelihood (model 1) log likelihood (model 2) 
-64.53133 -58.33241 

Thus, we find a highly significant seasonal effect (p = 0.002). The 
estimated time of peak and 95% Wald confidence limits can be quantified 

as 

> format (as . Date ( "2010-1-1 ") +constraints (httmod .fit) ["delta " ,c(l,5:6)] , 
+ "XdXb") 

Estimate 2.5% 97.5°/, 
"12Dec" "04Nov" "19 Jan" 

and with an estimate around middle of December this is in reasonable agree- 
ment with a suggested peak around winter solstice (about December 21). 
The parameter estimates of the initial model (Figure [15]) yielded very similar 
seasonal parameter estimates and confidence limits. 

Next, we will examine the interaction between season and the 5-HTTLPR 
polymorphism. This is done via a multigroup analysis and hence we divide 
the data into two groups defined by the genetic variable 

> dl <- subset (dd, G=="LL") 

> d2 <- subset (dd, G=="nonLL") 
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A standard multigroup analysis could be conducted where the parameters 
0i and 02 would be allowed to vary freely in the two groups. However, a 
more biological plausible model is to fix the translation parameter 5 in the 
two groups to be the same and let the amplitude A be free. This can be 
implemented via the left-hand-side of (1951) and the constrain-method 

> m <- baptize (kill (httmod, ~G+cos+sin) ) 

> intercept (m, ~cau+eta) <- list (0, "mul ") 

> regression (m,cau~eta) <- 1 

> regression(m,eta~Day) <- 

> m2 <- m 

> intercept (m2, ~eta) <- "mu2" 

> mycos <- function(x) x[2]+x[3]*cos(2*pi*(x[l]-x[4])/365) 

> constrain (m,mu~Day+nul+Al+delta) <- mycos 

> constrain (m2,mu2~Day+nu2+A2+delta) <- mycos 

Here we explicitly chose caudate nucleus as our reference region by fixing 
the factor loading to 1 and intercept to 0. The intercept parameters nul 
and nu2 describes the main effect of the genotype. The significance of the 
interaction can then be examined with a LRT against the first model 

> httmod. 2 <- estimate (list (m, m2) , list (dl,d2)) 

> compare(httmod.2,httmod.fit) 

Likelihood ratio test 

data: 

chisq = 4.7267, df = 1 , p-value = 0.0297 
sample estimates: 

log likelihood (model 1) log likelihood (model 2) 
-55.52446 -57.88782 

The estimated amplitude parameters with 95% confidence limits are 

> conf int (httmod. 2) [c("Al" , "A2") ,c(l ,3,4)] 

Estimate Pr(>|z|) 2.5% 
Al 0.01672275 0.691577803 -0.06589577 
A2 0.14670814 0.002547141 0.05142227 

and the difference in amplitude can be quantified as 

> constrain (httmod. 2, dA~Al+A2) <- diff 

> constraints (httmod. 2) ["dA", ] 
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Estimate Std. Error Z value Pr(>|z|) 2.5% 97.5% 

0.129985393 0.061819428 2.102662508 0.035495282 0.008821541 0.251149245 



with peak time around January 1: 
coef (httmod . 2) [[1]] ["delta",] 



Estimate Std. Error Z value Pr(>|z|) 
0.24805873 17.55228131 0.01413256 0.98872422 



Hence we see a statistical significant difference in amplitude between the 
two genotypes (p = 0.03, 95% confidence limits [0.01; 0.25]), with carriers 
of the short 5-HTTLPR allele having on average a higher seasonal variation 
in SERT binding. 

To visualize the harmonic curves of the two genotypes we need to predict 
the parameters mul and mu2. This can also be achieved with constraints 
where we supply the argument idx indicating which non-linear parameter 
to extract. For a multigroup model we also need to specify which group 
to extract this parameter from via the argument k. For instance predicting 
the two means on each day of the year with 95% confidence limits we can 
write 

> mul <- constraints (e ,k=l , idx="mu" ,data=data. frame (Day=l : 365) ,level=0 .95) 

> mu2 <- constraints (e,k=2, idx="mu2" , data=data. frame(Day=l :365) , level=0. 95) 

See Figure [T7| for a plot of these two curves. 

This study shows that S-allele carriers (nonLL) on average have a much 
more varying SERT binding level which could suggest a decreased serotonin 
concentration in the winter in this group. This could be caused by the 
decreasing amount of daylight in this period in the study country (Denmark) 
and supplements previous studies showing that S-allele carriers have higher 
risk of developing SAD. A d etailed discussion can be found in the original 



paper ( iKalbitzer et all 120101 ) . 



12. Conclusion 

A package lava has been developed which covers the classical covariance 
structure analysis and which hopefully can serve as a platform for method- 
ological development in the field of structural equation models and related 
models. 

The key features of the package is 
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Figure 17: Estimated harmonic curve of nonLL and LL group with 95% point-wise 
confidence limits. 

1. Easy interactive specification and visualization of complex models 

2. Simulation routines (for a broad class of models beyond the LLVM) 

3. Extensions to binary and censored data via lava.tobit 

4. Multigroup analyses 

5. MLE with data missing at random 

6. Non- linear parameter constraints and covariate effects 

7. Asymptotically correct standard errors for clustered correlated data 

Further, the program is built up around a series of modules (optimizers, 
estimators, plot hooks, simulations hooks, pre and post estimation hooks) 
which should ensure that future extensions can be written quite easily. This 
has been one of the main aims during the development of the lava package. 

Additional extensions of the package is currently under preparation in- 
cluding non-linear random effects and MLE for a broader class of the expo- 
nential family with estimation based on adaptive quadrature rules. In this 
process, we are preparing to export loop-intensive parts of the program to 
C++, which also should give a considerable computational and memory (via 
call-by-reference) improvement for some of the closed-form likelihood mod- 
els (e.g. random slope models) and models with large number of variables. 

If you use lava please cite this paper and the R software in publications. 
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Appendix A. Some zero-one matrices 



In this section we will define a few matrix-operators in order to define 
various conditional moments. Let £? G IR nxm be a matrix, and define the 
indices x = {xi, ...,x k } G {l,...,n}, and y = {y u ...,yi} G {1, . . . , m}. 
We define J nx = J x G R( n - fe ) x " as the n x n identify matrix with rows x 
removed. E.g. 



'6,(3,4) 



/ 1 








1 











To remove rows x from B we simply multiply from the left with J n)X . If 
we in addition want to remove columns y we multiply with the transpose 
of J nt y from the right: 

(A.l) 



We will use the notation J to denote the matrix that removes all latent 
variables (77) from the vector of all variable, U as defined in ([8]). We denote 
Jy the matrix that only keeps endogenous variables (Y). 

We also need an operator that cancels out rows or columns of a ma- 
trix/vector. Define the square matrix p nx G ]R nxn as the identity-matrix 
with diagonal elements at position x canceled out: 



E.g. 







i =j 


i 


£ X 


=1 




else. 






/ 1 











o\ 





1 


















































1 
















1 / 



(A.2) 



P6,(3,4) 



To cancel out rows x and columns y of the matrix B G R ri><m we calculate 

p n , x Bp' my . 

We will use px and p^ x as the matrix that cancels out the rows correspond- 
ing to the index of the exogenous variables (X) respectively the matrix that 
cancels out all rows but the ones corresponding to X. 
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Appendix B. The Score Function and Information 



In this section we will calculate the analytical derivatives of the log- 
likelihood. In order to obtain these results we will first introduce the no- 
tation of some common matrix operations. Let A G R mxn , then we define 
the column-stacking operation: 



vec(A) 



\a n J 



where ai denotes the ith column of A. The unique commutation matrix, 
K- x " is defined by 



K (m ' n) vec(A) = vec(A'). 



fB.l} 



Letting be the m x n-matrix with one at position and zero 

elsewhere, then 



K (m,n) = ^2(H (i ' j) <g> HW), 

i=l j=l 



e.g. 



#•(2,3) 



/ 1 














\ 








1 























1 








1 























1 








\o 














1 / 



It should be noted that product with a commutation matrix can be imple- 
mented very efficiently instead of relying on a direct implementation of the 
above mathematical definition. 

Let A G R mxn and B G W xq then the Kronecker product is the mpxnq- 
matrix: 



/ a n B ■ ■ ■ a ln B 



A®B 



\a m iB 
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We will calculate the derivatives of (|20|) by means of matrix differential 
calculus. The Jacobian matrix of a matrix-function F: MJ 1 — > R mx P is the 
mp x n matrix defined by 



80' 

Letting d denote the differential operator (see lMagnus and Neudeckerlfll988h ). 
the first identification rule states that dvecP(0) = A(0)dO =>■ DF(0) = 

MP)- 

Appendix B.0.1. Score function 

Using the identities d log \X \ = t^X" 1 d X) and d X' 1 = -X~ 1 {dX)X- 1 , 
and applying the product rule we get 

71 71 

d£(0) = --tr^dOe) - - tr(d{En^}) (B.2) 

77 Tli - 

= --tr^dOe) + -tr(S^ 1 [dn ]^ 1 ), (B.3) 

where 

d tl e = {d G } P e G' + G {d P G' e } (B.4) 

= {d Go} P G' e + G {d P } G' e + G e P e {d G e }' (B.5) 

= {d G e } P e G' e + [{d G e } PeG' e }' + G {d P e } G' B , (B.6) 

and 

d G e = J(l m - Ae)- 1 {d A } (l m - A*,)" 1 . (B.7) 

Taking vec's it follows that 

dvecG = [((l m - Ae)- 1 )' ® G e ] dvecA d0, (B.8) 

hence by the first identification rule 

dvecGo r//-. a n-in/ dvecAg _ 

= [((l m - A,)- 1 )' ® G fl ] — (B.9) 

and similarly 

<9vecf2 e ,„ r . (k fc v r „ _ .dvecGo r _, „ n <9vecP0 
= (l fc2 + [G P ® l fc ] — ^— + [G ® G ] 

(B.10) 
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and finally (exploiting the symmetry of Qg and commutative property under 
the trace operator) we obtain the gradient 



d£[0) _ n fdvecQ e \' 
2 



89 



89' 



vcc 



n I 8 vec fie \ ' 
2 V 80' 



vec [n^ 1 ] . 

(B.ll) 



Next we examine the model including a mean structure ( 120)1 . W.r.t. to the 
first differential we observe that 

dtr {TgQg 1 } = - tr {T e fl e \d ft )ft e x } + tr {(dTe)^ 1 } . (B.12) 

Hence 

8£(9) n fdvectooY m-i^ r»-n n fdvecflgY 

- — vec \il a Tail a \ 1 

2 V 00' J 1 6 6 J 2 V 
n ( 8 vec Tq \ ' 



89 



89' 



vec 



89' 



vcc 



(no 1 ) ■ 



Further by the chain-rule 
8 vec Tq d(fl — v ) 8 vec T e 



(B.13) 



die 



89' 



89' 8(fi - = ~ [U ^ + ^ ~ ^ ® U] W> 

(B.14) 



and 



d| = (dG )u0 + Go(du e ). 
Taking vec (GgVg = lGgVg): 

d €o ( i k> i ,dvecG 8v e 

H ® Ifc) 5^ + G - 



90' 



90' 



90' 



(B.15) 



(B.16) 



We have calculated the full score but in some situations it will be useful to 
evaluate the score in a single point. The contribution of a single observation 
to the log-likelihood is 



{9 | Zi) oc -iog|n«| + -(zi -teyn^izi -Ze 



2 
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(B.17) 



or as in (120]) where we simply exchange T e with T z . g = — ie)(zi — ig)', 
hence the score is as in (IB. 131) where (1B.14[) is calculated with Zi instead of 
/x. Alternatively, letting — ig = Ug = Ug{i): 



d^u'gfl^Ug) = u'g [(2^) dug + (dflg^Ug] 

= -U'g [{2Q,g l ) dig + fig 1 {d fig) fig 1 Ug] 



(B.18) 



where we used that for constant symmetric A the differential of a quadratic 
form is 

d(u'Au) = 2u'(A) d u. (B.19) 

Hence the contribution to the score function of the zth observation is 
1 



Sid) = -- 



V6C( e >~d0> 6 6 86' 



- {u'gflg 1 ® Ugflg') — I 

If/ i> 1 1lX 9 vec fig . ^ ,d vec ig I 

- | (vec(n^) - vecin^ueu'gflg 1 )) - 2u^- 1 ^^ j 



(B.20) 



where the score-function evaluated in 6 is S{6) = Yui=i^i(^) 

Appendix B.0.2. The Information matrix 

The second order partial derivative is given by 



di{o) 
~dl~e~ 



ld_ 
2d9~ 



|vec(SZ ) — vec(ilg ugUgltg ) j — — 2u a fl 



09, 



we 



89, 



(B.21) 



Taking negative expectatio n with respect to the tru e para meter 6 we obtain 
the expected information (IMagnus and Neudeckerl . Il988l ). which get rid of 
all second order derivatives 



Z(0 O ) 



d vec fig 



06' 

die 
06' 



0=00 J 



fl 



-1 



die 



vec fie 



06' 



0=00, 



06' 
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(B.22) 
(B.23) 



= 00 , 



We will further derive the observed information in the case where the second 
derivatives vanishes in the case of the matrix functions Ag,Pg and Vg. Now 



d 2 G e = d [J(l - A e )-\dA e ){l - Ag)- 1 ] . 



Hence 
d 2 G e 



dA \i- Ae r dAe 



de 3 



dA e 

de, 



(i 



oe,. 



(B.24) 

Ag)- 1 . 
(B.25) 



Next we will find the derivative of (IB.lOj) . We let m denote the number 
of variables, p the number of parameters, and k the number of observed 
variable (e.g. Gg G ]R fcxm and the number of columns in the derivatives are 
p). We have GgPg G R fcxm and using rules for evaluat ing the differential of 
Kronecker-product (see iMagnus and Neudeckerl ((1988) pp. 184) we obtain 



dvec 

~oW {GePe 



W = (lm ® K^ k) <g> l k ) (l km ® vec l k 



dvec GgPg 



86' 



(l m g> K {k ' k) <g> 1*) (l km ® vec l fc ) x 



„ .dvecGg _ . <9vecPe 

:p ^ i fc ) „„, - + (i w ® G g ) 



ao 7 



90' 



(B.26) 



And 

d vec 
<90 



(Go (8) Gg 



d vec P# 
<90' 



9 vec P e ' 

d0> 
d vec Pe ' 

dd> 



<g) lfc2 
® lfc2 



l fem <g> vec G + vec Gg <g> 1 



9 vec G (g) G 

(l m <g> ® lfc) x 

d vec Ge 



fcm J 



dO' ' 
(B.27) 



Hence from (1B.26|) and ( 1B.27|) and using rules for applying the vec operator 
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on products of matrices we obtain 

^vecSJ" [pr^)'® (I* +**»>)] ( lm 8Jt w 9 lt)(lkm 



dOdO' 



> vec lfc) x 



(9 



+ 



„ , ,\d 2 vecG e 



90' 



^ ^, ^ \dvecG 

l km <g) vec G + vec G <g> l fcm J — — — , 

(B.28) 

with the expressions for the derivatives and second derivatives of Gq given 
flRQjl and flR25]) . Further 

<9 2 |e 9 vec , d vec G^ 9 vec <9v0 

> vec 1^ 



in 



<90<90' 80' 

( ( d vec G V , , . , 
' ® U 1 



dve_ 
W 

d vec «0 



/-. / / NN <9 2 vecG 
+ (l p ®K(8>l fc )) 

d vec G# \ ' dv 
09' ) W 



86' 



(B.29) 



+ 



and 

and 
d( 



— CT 1 - -(ST 1 R> o-l ^ VeC ^ 
^/ "0 ~~ ^ lt e > qqi ■ 



(B.30) 



n^ueu'grig 1 ) = -Vtg l {dQ, e )riQ 1 ueu' 9 Q,e + ^(d^Ve^e 1 (B.31) 
+ n^u^du;)^ 1 - n^Ueu'gCleidne)^ 1 . (B.32) 
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By using the identity vec(ABC) = (C <8> A) vec(B) several times we obtain 



86' 



- (Og 1 ® [n^tie]) 



86' 

8 vec £ 

86' 



i i ; ~ ii\ <9vecf2a 



(B.33) 
(B.34) 
(B.35) 
(B.36) 



and the second order derivative of the log-likelihood (IB.21j) now follows from 
applying the product rule with (1R28]) . f lB~29|) . flB~30|) and f lB~33]) . 
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